Users upload photos to my site, using an HTML5 plupload runtime that submits the files to a php script at /upload-process
.
Normally this works fine, but sometimes the call to /upload-process
times out.
The very first line of code in /upload-process
writes to a logfile, and there's no entries that correspond to the 408
s, so I don't think /upload-process
is ever even successfully reached - but the 408
s are present in my nginx error log file.
The way plupload works is the user can add hundreds of photos to a queue, and then hit "upload" and leave it running. What I find odd about the 408s is often once there's been one of them, that user will get one for every file in the queue.
For instance, last night someone started a queue at 5:30pm, and between then and and 7am this morning close to a thousand 408
s happened. The log file shows them all as timeouting after 60 seconds.
Meanwhile, the website remains absolutely functional to everyone else. I was using it myself quite a bit at that time, as were 200 other people.
So I guess the user in question is tied to some keepalive child process (I know very little about how HTTP works), and that child process has died, and taken all of it's subsequent requests with it - does that sound plausible? I suppose it might be a php-fpm process rather than nginx...?
What steps could I take to fix this, or at least debug it so I understand more what has gone wrong?
Many thanks!