1

I'm running Apache 2 with PHP 7.4.29 servers on Ubuntu 20.04 on AWS using AWS load balancer and auto scaling group. The servers connect to other AWS resources such as dynamodb, RDS (mysql), memcache etc.. This is a stable production environment doing 10Ks/hits per min normally and works flawlessly. We recently had peaks of X3 normal traffic and the servers started to have slow response time.

New Relic shows only that PHP time is larger for these transactions and does not point at a specific service.

The problem is that at a given time some of these servers have normal response time (~30 msec) and some of them have slow response time (~500 msec). And this alternates between the servers. Therefore It doesn't look like the slowdown is related to an external service such as RDS because the services are the same for all servers. I'm attaching the response time of all the servers which were active at a specific time slot. What can cause such a behavior?

TLDR: I'm asking how to find the reason for PHP/Apache servers' response time slowdown when it doesn't happen concurrently on all servers (therefore its not an external shared service) and new relic just show it as PHP time in the transactions without additional info.

enter image description here

Niro
  • 1,371
  • 3
  • 17
  • 35
  • 3
    Graphs are only an indication. This is going to need individual log analysis to work this out. If you're not already you'll need apaches time to service (%D) adding to your log format. Try to gather all requests over a certain size and look for patterns in the data. – Matthew Ife May 31 '22 at 09:04
  • Could you post your MySQL Slow Query Log lines from 1:47 AM to 2:11 AM for analysis for the 2 servers reported on your last graph? There should be clues. – Wilson Hauck Jun 06 '22 at 20:36
  • @Niro From your MySQL Command Prompt, what is text result of SELECT @@max_connect_errors; Thanks – Wilson Hauck Jul 13 '22 at 20:49

0 Answers0