4

When I replace served file (or modify symlink) and download this file simultaneously, Apache rarely (in small fraction of percent) responds with headers from old file, but with content from new file.

I tested it on few versions of Apache 2.2 (2.2.3, 2.2.22 - Debian stable), locally and remotely, on virtual and physical machines, on different distributions (Red Hat, CentOS, Debian) - I could always reproduce it using Python script repeatedly downloading file in threads (20-200 threads) and replacing it on server from time to time (like every 100ms).

Where lies the problem? Is it Apache's fault or maybe I'm doing something wrong?

Update: I also tested Nginx, it doesn't have this problem. But in rare cases (100 times rarer than on Apache), it doesn't see the file and serves default content (404 or default page).

Tupteq
  • 149
  • 4

3 Answers3

2

Apache is not intended to serve dynamic content from a file system directly. I would expect problems like this simply due to unintentional caching within the program. If you need to serve dynamic content, use scripting, CGI, or similar.

Chris S
  • 77,337
  • 11
  • 120
  • 212
  • So, if that's true, then to replace any served file (100% atomically) you had to stop server, replace file, then start server. This sounds odd. – Tupteq Feb 20 '14 at 14:16
  • Yes. I'm not sure why that would be odd. As mentioned Apache was *not* intended to serve files as you're trying to do. – Chris S Feb 20 '14 at 14:42
  • If so, then every shared hosting service based on shared Apache instance have this problem - `~/public_html` directories can be changed by many users simultaneously. That will be millions of web pages. Also, I really doubt every (any) admin using Apache stops daemon before update of any file, so in practice almost every Apache server suffers from this problem and feature of serving multiple web pages by one Apache instance (per-user web directories, virtual hosts) makes proper usage (stop, update, start) even harder to enforce. For me it looks like a BIG design flaw and that's odd for me. – Tupteq Feb 24 '14 at 09:51
  • I wouldn't dispute that it's a design flaw, but 1. The odds of it happening at incredibly low 2. In the vast majority of occurrences it wouldn't make a meaningful difference 3. I'm still not seeing how this is a "real" problem. – Chris S Feb 24 '14 at 14:24
  • Let me explain why it's a problem for me (and in general). In front of my Apache I'm using Amazon CloudFront to handle large traffic. With thousands of users CouldFront requests for resources often from origin (Apache) server, content on server changes often and to keep updates atomic, I'm using symlink replace (so effectively thousands of files may change in one moment). All this factors make probability of error more probable (I have confirmed few occurrences). Moreover, proxy in front of web server makes this errors last for longer (they may be cached for some time). – Tupteq Feb 25 '14 at 11:28
  • Ok, you're using a mechanism meant for *static content*, even with CloudFront caching it - so it's really static - to serve *dynamic content*. This is the problem, not Apache. – Chris S Feb 25 '14 at 14:15
  • My content is static (files, not scripts), but sometimes changes. Almost Every web content changes from time to time, so I don't think I'm doing something wrong. I bet that on 99.9% of Apache servers content update uses the same mechanism (replace file while server is running). In my case a scale effect (large traffic, frequent updates) showed up, so I error showed up more frequently. But now I'm pretty sure this problem shows up millions of times every day on all Apache servers in internet. So, IMO there are two possibilities: nearly 100% Apache users use it incorrectly or it's a design flaw. – Tupteq Feb 25 '14 at 15:49
2

Instead of fiddeling in the file system I suggest changing the server configuration, i.e. change

DocumentRoot /var/www/version_1

to

DocumentRoot /var/www/version_2

and then emit a apachectl -k gracefull. Using includes it´s just a small snippet you have to overwrite. Obviously there is still a period, where some apache processes may serve old files with old headers and others new files with new headers, but the problem of mixed headers/content shouldn't happen.

  • Yeh, it makes sense, but with my use I had to do it many times (maybe hundreds) every day. Also, many concurrent (currently 8) tasks may modify files in the same time. Moreover, all this work on one virtual host (just one document root). So, it would be hard to synchronize all this (without resigning from parallel updates). – Tupteq Oct 30 '15 at 14:39
1

On POSIX compliant systems rename is atomic. So it should be safe and consistent to write to filename.new and then "mv filename.new filename". Any open handles on the "old" filename will get the content at the old inode, and new requests will get the new one.

quadruplebucky
  • 5,041
  • 18
  • 23
  • I tried rename and it didn't work. I think the problem is in Apache, because lack of atomicity of request handling (in context of file system). – Tupteq Feb 20 '14 at 14:14
  • 2
    It's great that the file system is atomic, but that doesn't change how Apache works. – Chris S Feb 20 '14 at 14:42