1

Originally posted on stackoverflow, and was recommended serverfault may be better place.

I have a site using:

  • AWS RDS (MySQL Aurora) - single t3.medium instance
  • 4 x EC2s on a Load Balancer (fixed instances not elastic)
  • CodeIgnitor 3 codebase (3.1.11) (I have just upgraded from 3.1.7 upon recommendation as there were some Session improvements in newer versions).

Some specs:

EC2s:

PHP Version 7.2.32-1+ubuntu18.04.1+deb.sury.org+1
Linux ip-172-32-19-104 5.4.0-1028-aws #29~18.04.1-Ubuntu SMP Tue Oct 6 17:14:23 UTC 2020 x86_64
Apache/2.4.29 (Ubuntu)

RDS:

5.6.mysql_aurora.1.22.2
Instance class: db.t3.medium
vCPU: 2
RAM: 4 GB

Under heavy load (500 people trying to log in in the space of ten mintues), we experience intermittant but significant issues. It's been hard to get information on exactly what users are experiencing but things point to:

  • The RDS MySQL Aurora CPU spikes significantly (100%)
  • The RDS MySQL Aurora Connections spikes (30-45) - From what I've read, the RDS Max Connections is {DBInstanceClassMemory/12582880}, so around 340 4GB(102441024*1024)/12582880
  • The resulting error Deadlock found when trying to get lock; try restarting transaction - see below for full error trace.

I have made the, perhaps incorrect, asusmption that it's therefore:

  1. increase load >> increase RDS CPU usage
  2. high RDS CPU >> Deadlocks >> Fatal MySQL Error (I'm not too familar on Deadlocks to know if this is what can happen but sounds feasible).

The error points to libaries\Session\drivers\Session_database_driver.php, specifically:

     /**
     * Write
     *
     * Writes (create / update) session data
     *
     * @param   string  $session_id Session ID
     * @param   string  $session_data   Serialized session data
     * @return  bool
     */
    public function write($session_id, $session_data)

   ...
   ...
   ...
   if ($this->_db->update($this->_config['save_path'], $update_data))
        {
            $this->_fingerprint = md5($session_data);
            return $this->_success;
        }

So, we're getting a database Deadlock when trying to update the CI Session.

It seems to always throw the error during the user's login process, which I presume is update session heavy.

This session and database class is as per the CI 3.1.7 codebase.

The current Code Ignitor Session config is as follows:

$config['sess_driver'] = 'database';
$config['sess_cookie_name'] = 'ci_session';
$config['sess_expiration'] = 7200;
$config['sess_save_path'] = 'ci_sessions';
$config['sess_match_ip'] = FALSE;
$config['sess_time_to_update'] = 300;
$config['sess_regenerate_destroy'] = FALSE;

So, if my assumption is correct, what would be the best plan of action:

  1. Move to RDS Serverless and let the RDS scale to handle CPU load? (I read somewhere that Serverless might not handle locks well, as it can't scale properly when it's in lock... my understanding on that is obviously limited)
  2. Move to a bigger, fixed (non serverless) RDS to handle CPU Load? (not ideal as 95% of the time the site has no traffic)
  3. Amend sessions to be stored in files instead of database - this to me sounds logical as then we take all of the session load away from the MySQL, but I am not fully aware of any other consequences, nor if it's just a case of amending $config['sess_driver'] and setting up the session file folder path
  4. Something else... (php-fpm?)

With option 3), we use an Load Balancer, so I worry that file based sessions would mean loss of a user's session if they switch LB mid-way though. Although, this may be a managable issue, since the user will stay on the LB for the duration of their stay unless it falls over mid-way.

Options 1 and 2 seem like band aid approaches rather than fixing an ineffecient problem, but then, it could simply be a case of not enough resources.

I read elsewhere a sugestion on a similarish post to use php-fpm to reduce the number of simulaneous apache threads, but not sure if that's relevant here, especially given on php 7.2

It's hard to 'test' as it only happens under large user-login load, so some suggestions would be really appreciated, so I don't have to take multiple stabs in the dark.

Thanks

EDIT:

Copy of full error below:

A Database Error Occurred 

Error Number: 1213 

Deadlock found when trying to get lock; try restarting transaction 

UPDATE `ci_sessions` SET `timestamp` = 1604298368 WHERE `id` = 'fqi83a50dfknbvl9h2r98mtgn2f3j2j6' Filename: libraries/Session/drivers/Session_database_driver.php 

Line Number: 260 

A PHP Error was encountered 

Severity: Warning 

Message: Unknown: Cannot call session save handler in a recursive manner 

Filename: Unknown 

Line Number: 0 
Backtrace: 

A PHP Error was encountered 

Severity: Warning 

Message: Unknown: Failed to write session data using user defined save handler. (session.save_path: /var/lib/php/sessions) 

Filename: Unknown 

Line Number: 0 

Backtrace

EDIT: SHOW CREATE TABLE ci_sessions;

'ci_sessions', 'CREATE TABLE `ci_sessions` (
 `id` varchar(128) NOT NULL,
 `ip_address` varchar(45) NOT NULL,
 `timestamp` int(10) unsigned NOT NULL DEFAULT \'0\',
 `data` blob NOT NULL,
  KEY `ci_sessions_timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8'
rjbathgate
  • 75
  • 2
  • 11
  • Every InnoDB table should have a `PRIMARY KEY`; I don't see an explicit PK. Please provide the generated SQL statement(s) that are involved in the spike and deadlock. – Rick James Nov 03 '20 at 17:55
  • Have added `PRIMARY KEY` with `ALTER TABLE ci_sessions ADD PRIMARY KEY (id);` and a single test case shows notable improvement. Now going to try and replicate it under load. SQL statement in question: `UPDATE ci_sessions SET timestamp = 'xyz' WHERE id = 'abc'` – rjbathgate Nov 03 '20 at 20:49
  • I assume `id` is Unique? The PK is, by definition, unique. Without such an index, the `UPDATE` had to scan the entire table. That both takes time and adds to the potential for deadlocks. – Rick James Nov 04 '20 at 00:09
  • Benchmark stress testing (using mysqlslap) showed massive improvements following the `PRIMARY_KEY` index added to `ci_sessions`, and in production, we're seeing load levels which previously caused problems currently working fine. Thank you :-) – rjbathgate Nov 04 '20 at 00:53
  • Check your other tables for PK. A PK can be "natural" -- referring to some column(s) that is unique. Or it can be a "surrogate" -- an `AUTO_INCREMENT` number. Also look at all `WHERE` clauses in your code; they _usually_ should reference column(s) of the table's PK or a secondary `INDEX(...)` for the table. – Rick James Nov 04 '20 at 00:58

1 Answers1

2

You need some kind of index on id.

If id is unique, it should probably be the PRIMARY KEY.

Rick James
  • 2,058
  • 5
  • 11