1

We're running node.js servers on App engine flexible that are connecting to mysql using a shared private network. (app engine instances are deployed into the same vpc subnetwork that mysql lives in).

So on app engine side in app.yaml:

network:
  name: projects/<projectId>/global/networks/<subnetwork>
  subnetwork_name: <subnetwork>

The mysql instance is a MYSQL_8_0, db-n1-standard-1 instance with 10GB storage attached, availability zonal, auto backup enabled. The mysql database exposes a private IP and only allows SSL connections. Generally the instance is almost unused with max 6% usage and plenty of free storage left (even though Auto storage increase is enabled).

Our node servers use Sequelize to connect to mysql whith all the right SSL configs setup:

    sequelizeClient = new Sequelize(MYSQL_CONF.database, MYSQL_CONF.user, dbPass, {
      host: MYSQL_CONF.host,
      dialect: "mysql",
      dialectOptions: {
        ssl: {
          key: cKey,
          cert: cCert,
          ca: cCA,
        },
      },
    });

MYSQL_CONF.host being the private ip used to connect for ex: 10.75.0.3.

Generally this setup works well but intermittently (happened twice) we started getting mysql connection timeouts with following message in logs:

On mysql side:

"2022-08-01T08:33:48.239519Z 1899 [Note] [MY-010914] [Server] Got timeout reading communication packets"

On node side:

"SequelizeConnectionError: connect ETIMEDOUT
    at ConnectionManager.connect (/app/node_modules/sequelize/lib/dialects/mysql/connection-manager.js:102:17)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async ConnectionManager._connect (/app/node_modules/sequelize/lib/dialects/abstract/connection-manager.js:220:24)

but also:

"SequelizeConnectionError: connect EPIPE 10.75.0.3:3306
    at ConnectionManager.connect (/app/node_modules/sequelize/lib/dialects/mysql/connection-manager.js:102:17)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async ConnectionManager._connect (/app/node_modules/sequelize/lib/dialects/abstract/connection-manager.js:220:24)"

and:

"SequelizeConnectionAcquireTimeoutError: Operation timeout
    at ConnectionManager.getConnection (/app/node_modules/sequelize/lib/dialects/abstract/connection-manager.js:204:15)
    at runMicrotasks (<anonymous>)
    at runNextTicks (node:internal/process/task_queues:61:5)
    at processTimers (node:internal/timers:497:9)
    at async /app/node_modules/sequelize/lib/sequelize.js:301:26
    at async MySQLQueryInterface.select (/app/node_modules/sequelize/lib/dialects/abstract/query-interface.js:407:12)
    at async Function.findAll (/app/node_modules/sequelize/lib/model.js:1134:21)
    at async Function.findOne (/app/node_modules/sequelize/lib/model.js:1228:12)
    at async readUser (/app/dist/web/services/user.js:35:18)
    at async getCurrentUser (/app/dist/web/controllers/user.js:49:20)"

also:

"SequelizeConnectionError: 139662677354432:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:332:

    at ConnectionManager.connect (/app/node_modules/sequelize/lib/dialects/mysql/connection-manager.js:102:17)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async ConnectionManager._connect (/app/node_modules/sequelize/lib/dialects/abstract/connection-manager.js:220:24)"

and:

"Error: Client network socket disconnected before secure TLS connection was established
    at connResetException (node:internal/errors:691:14)
    at TLSSocket.onConnectEnd (node:_tls_wrap:1585:19)
    at TLSSocket.emit (node:events:402:35)
    at endReadableNT (node:internal/streams/readable:1343:12)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)"

and:

"SequelizeConnectionRefusedError: connect ECONNREFUSED 10.75.0.3:3306
    at ConnectionManager.connect (/app/node_modules/sequelize/lib/dialects/mysql/connection-manager.js:92:17)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async ConnectionManager._connect (/app/node_modules/sequelize/lib/dialects/abstract/connection-manager.js:220:24)"

This has happened once in production and lasted about 8 hours and another time in our dev environment and lasted about the same amount of time. Both envs have same infra but are on different gcp projects.

Just to make sure, the connection between our services and database is generally stable and everything works well, it's just for several hours at a time the performance degrades.

No transactions are being left uncommitted, config max connections is not even closely achieved - max open connections were 21 at the time of this issue occurring and the max amount is 4030. No MySQL configs/database flags were changed, everything is in the default state left by Cloud SQL. Generally the SQL commands are very basic and the max communication payload is never reached.

Is this just GCP degraded performance or is it a user error in using their services? Did anyone else encounter these issues?

Variables:

SHOW GLOBAL VARIABLES LIKE '%time%';
Variable_name                  |Value    |
-------------------------------+---------+
binlog_max_flush_queue_time    |0        |
connect_timeout                |10       |
default_password_lifetime      |0        |
delayed_insert_timeout         |300      |
explicit_defaults_for_timestamp|ON       |
flush_time                     |0        |
have_statement_timeout         |YES      |
innodb_flush_log_at_timeout    |1        |
innodb_lock_wait_timeout       |50       |
innodb_old_blocks_time         |1000     |
innodb_rollback_on_timeout     |OFF      |
interactive_timeout            |28800    |
lc_time_names                  |en_US    |
lock_wait_timeout              |31536000 |
log_timestamps                 |UTC      |
long_query_time                |10.000000|
max_execution_time             |0        |
net_read_timeout               |30       |
net_write_timeout              |60       |
regexp_time_limit              |32       |
replica_net_timeout            |30       |
rpl_stop_replica_timeout       |31536000 |
rpl_stop_slave_timeout         |31536000 |
slave_net_timeout              |30       |
slow_launch_time               |2        |
system_time_zone               |UTC      |
time_zone                      |SYSTEM   |
wait_timeout                   |28800    |
SHOW GLOBAL STATUS LIKE 'aborted%';
Variable_name   |Value|
----------------+-----+
Aborted_clients |4    |
Aborted_connects|49   |
SHOW GLOBAL STATUS LIKE '%flush%';
Variable_name                   |Value |
--------------------------------+------+
Com_flush                       |196   |
Flush_commands                  |3     |
Innodb_buffer_pool_pages_flushed|313855|
Key_blocks_not_flushed          |0     |
Not_flushed_delayed_rows        |0     |
Uptime_since_flush_status       |176244|
SHOW GLOBAL STATUS LIKE 'uptime%';
Variable_name            |Value |
-------------------------+------+
Uptime                   |176264|
Uptime_since_flush_status|176264|
SHOW GLOBAL VARIABLES LIKE '%flush%';
Variable_name                     |Value   |
----------------------------------+--------+
binlog_max_flush_queue_time       |0       |
flush                             |OFF     |
flush_time                        |0       |
innodb_adaptive_flushing          |ON      |
innodb_adaptive_flushing_lwm      |10      |
innodb_flush_log_at_timeout       |1       |
innodb_flush_log_at_trx_commit    |1       |
innodb_flush_method               |O_DIRECT|
innodb_flush_neighbors            |2       |
innodb_flush_sync                 |ON      |
innodb_flushing_avg_loops         |5       |
innodb_idle_flush_pct             |100     |
innodb_log_wait_for_flush_spin_hwm|400     |
SHOW GLOBAL STATUS LIKE '%rollback%';
Variable_name             |Value|
--------------------------+-----+
Com_rollback              |2    |
Com_rollback_to_savepoint |0    |
Com_xa_rollback           |0    |
Handler_rollback          |4    |
Handler_savepoint_rollback|0    |
SHOW GLOBAL STATUS LIKE '%savepoint%'; 
Variable_name             |Value|
--------------------------+-----+
Com_release_savepoint     |0    |
Com_rollback_to_savepoint |0    |
Com_savepoint             |0    |
Handler_savepoint         |0    |
Handler_savepoint_rollback|0    |
  • Any chance you could post from each server involved, A) SHOW GLOBAL VARIABLES LIKE '%time%'; and B) SHOW GLOBAL STATUS LIKE 'aborted%'; and C) SHOW GLOBAL STATUS LIKE '%flush%'; and D) SHOW GLOBAL STATUS LIKE 'uptime%'; and E) SHOW GLOBAL VARIABLES LIKE '%flush%'; and F) SHOW GLOBAL STATUS LIKE '%rollback%'; and G) SHOW GLOBAL STATUS LIKE '%savepoint%'; there may be some clues in this data. – Wilson Hauck Aug 02 '22 at 22:26
  • 1
    Added to initial post, there's only one single MySQL instance with multiple databases for each node.js server. thanks for the suggestion! – Tudor Sandu Aug 03 '22 at 07:57
  • Could you share with us the Google Availability Zones used for the servers of concern? As listed here - https://cloud.google.com/about/locations Thank you. – Wilson Hauck Aug 03 '22 at 13:47
  • And, your location country and time zone? We are in GMT -5 hrs, Birmingham, AL USA Are most of your connections within your Google Hosting Availability zone - or do you support a global audience? – Wilson Hauck Aug 03 '22 at 14:35
  • 1
    Sure, our database instance is deployed in us-east1 region us-east-1b zone, app engine region is us-east1. Our country is Romania, timezone GMT+3. – Tudor Sandu Aug 03 '22 at 14:44
  • A) Are most of your client connections within your Google Hosting Availability zone - or do you support a global audience? B) Any hope of correcting this error message related to SSL version number problem? "SequelizeConnectionError: 139662677354432:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:332: – Wilson Hauck Aug 03 '22 at 15:07

1 Answers1

1

Rate Per Second = RPS Rate Per Hour = RPHr

With information posted as of today,

Suggestions to consider for your Google Cloud Database flags, to be applied in each server involved.

connect_timeout=60  # from 10 to minimize aborted_connects of 49 RPHr 1 
innodb_old_blocks_time=60000  # from 1000 for 60 sec age out to reduce select_scan RPS
net_read_timeout=60  # from 30 to tolerate delays longer
regexp_time_limit=10000  # 32 milliseconds to ~ 10 second limit
replica_net_timeout=60  # from 30 to tolerate delays longer
slave_net_timeout=60  # from 30 to tolerate delays longer
slow_launch_time=10  # from 2 to tolerate longer launch time.

View profile for contact info and FREE Utility Scripts to assist with performance tuning.

Wilson Hauck
  • 426
  • 4
  • 10