0

On a wordpress site, we have requests coming in to CloudFront, which sends the request to origin (if necessary) which goes through an ELB, and to two or three instances that will service the request.

Most requests work, but when we upload a JSON file to admin-ajax.php, it results in a 504 error that we captured from the CloudFront logs:

2022-01-31  21:32:24    MIA3-C2 1462    67.190.247.197  POST    d2q8ixmwt0jy43.cloudfront.net   /wp-admin/admin-ajax.php    504 https://xxxxxxx.com/wp-admin/edit.php?post_type=elementor_library&tabs_group=library    Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/97.0.4692.99%20Safari/537.36%20Edg/97.0.1072.76 -   -   Error   Ae0gLzThCiZR2N5D8gLO6s-o8IwwlmYrxWvUbUgr1A64_nMpzN0qRg==    evolvetelemed.com   https   167737  30.107  -   TLSv1.3 TLS_AES_128_GCM_SHA256  Error   HTTP/2.0    -   -   64798   30.107  OriginCommError text/html   1033    -   -

The error appears to be an OriginCommError, but I cannot see why it is getting that error. Tailing the logs on our web servers on the instances shows that the request is not even reaching there.

I am not exactly sure why some requests would work and some wouldn't. To add further conclusion, when binary files in the media library are uploaded to async-upload.php, it works.

We use nginx as a webserver on the front end using php-fpm for processing php.

Barry Chapman
  • 400
  • 1
  • 4
  • 15

1 Answers1

3

This isn't a full answer but it's too long for a comment, and it might give you some ideas. I'll delete it once we get to a real answer.

Some thoughts:

  • The field "30.107" is "time-taken" (to process the request) according to this page. 30s is a standard http timeout length, which is interesting.
  • The 504 status response means "gateway timeout". More detail is "The server, while acting as a gateway or proxy, did not receive a timely response from an upstream server it needed to access in order to complete the request."
  • OriginCommError isn't defined in the CloudFront error logs. However "ClientCommError – The response to the viewer was interrupted due to a communication problem between the server and the viewer." suggest that the connection to the server was interrupted rather than it not being established.
  • The field "sc-content-len" is defined as "The value of the HTTP Content-Length header of the response." with the value "1033". That suggests some kind of a reply is getting back to the load balancer.

All this is suggesting to me that CloudFront has sent the request on to the ALB but something is timing out at some point. I would suggest you find the ALB logs and see if they shed any light onto this. I would want to see which server it tried to send the request to, and double check whether it arrived.

Tim
  • 30,383
  • 6
  • 47
  • 77
  • I was able to narrow down the problem based on your feedback. The request was indeed making it to nginx, however - PHP FPM was choking on a request, and for some odd reason was not logging the fatal error. The upstream failure was reported by nginx to CloudFront. Thank you for your assistance – Barry Chapman Feb 02 '22 at 15:24
  • Welcome :) I was wondering if it was PHP but as you'd said it hadn't reached Nginx you had to trace it through first. – Tim Feb 02 '22 at 17:11