1

I need to add cache policy to my static content.

Initially, I had manually listed all the extensions I deemed static, which is what almost every answer suggests:

location ~* \.(jpeg|jpg|gif|png|ico|css|bmp|js)$ {
    # …
}

I would then run into some new extensions, or realise I forgot some, and update the configuration:

location ~* \.(jpeg|jpg|gif|png|ico|css|bmp|js|cur|gz|svg|svgz|mp4|mp3|ogg|ogv|htc|woff2|woff)$ {
    # …
}

I am now adding webm and webp, so the above location directive grows further:

location ~* \.(jpeg|jpg|gif|png|ico|css|bmp|js|cur|gz|svg|svgz|mp4|mp3|ogg|ogv|htc|woff2|woff|webm|webp)$ {
    # …
}

And since there are many more existing and upcoming extensions (from .map to .avif and .jxl), and I can't possibly list them all, this hardcoding approach seems like a race without a finish line. So is there a future-proof way to have some location directive, which would automatically target all the existing (physical?) files Nginx reads from disk to client browser, i.e. the static files?

Jesse Nickles
  • 250
  • 1
  • 12
mehov
  • 568
  • 1
  • 5
  • 14
  • What URIs do you consider non-static? Perhaps you could start there and apply cache policy to everything else. – Richard Smith Mar 24 '21 at 09:08
  • Hi @RichardSmith, could be anything: the `/about/contact` and `/sitemap.php` pages could both be PHP-powered (and therefore non-static). I guess it gets down to whether Nginx serves an actual file, or passes the request to the PHP backend. Maybe I have to set up a flag of some kind in my PHP location block? – mehov Mar 24 '21 at 09:19
  • The PHP is processed by its own `location` block (usually `location ~ \.php$`). So it's easy to set cache headers for any other `location` block (perhaps just `location /`) that's different from PHP content. – Richard Smith Mar 24 '21 at 09:24

1 Answers1

0

In any way, using regex locations like that is more often than not unnecessary and should be avoided whenever possible. Each additional regex location has an extra performance impact, especially on a heavily loaded system. Due to the regex locations priority over the prefix ones, every request that will be eventually processed by your root location is checked for matching every regex location pattern before that. Having a big number of regex locations can significantly impact on overall server performance.

nginx request processing workflow

TL;DR First of all, I want to explain the nginx request processing mechanism. The request processing phases description from the development guide can be very helpful for anyone who want to figure it out. Take a look at it, since I'm going to refer on those phases during the explanation. I'll try to briefly explain the most important parts.

Every request can traverse through several location blocks during it processing. It can be passed from one location to another due to the rewrite ... last directive being triggered at the NGX_HTTP_REWRITE_PHASE (first explicit way), according the last argument of try_files directive when all the checks failed at the NGX_HTTP_PRECONTENT_PHASE (second explicit way) or via the index directive at the beginning of the NGX_HTTP_CONTENT_PHASE (implicit way). Actually, there is at least one more way for the request to change location via the error handler declared using error_page directive, however I'm not going to dig into this one since it is out of the scope for this answer.

Nevertheless every request, if not being terminated earlier with the rewrite/return directives or due to the access restrictions, finally ends up reaching the NGX_HTTP_CONTENT_PHASE in some location and uses that exact location settings, especially the content handler. It does not inherit any settings from the locations it is already passed through. Content handler can be specified explicitly (some examples are http_proxy_module using the proxy_pass directive, http_fastcgi_module using the fastcgi_pass directive, etc.) or the it will be attached implicitly using the http_static_module (I'm gonna call it static content handler).

What do I mean by "implicit internal redirect via the index directive"? Well, every location that is using static content handler has a NGX_HTTP_PRECONTENT_PHASE handler equal to try_files $uri $uri/ =404 directive (if not being specified explicitly using try_files directive with some other parameters). Every location has an index directive, either declared explicitly, inherited from previous configuration level, or having default value of index index.html. That is, assuming we didn't have any explicitly defined index directive at the server or http levels, the following location blocks are equal:

location / {
    index index.html;
    try_files $uri $uri/ =404;
}
location / {
    try_files $uri $uri/ =404;
}
location / {}

The $uri/ parameter of the try_files directive allows further request processing inside the current location if an $uri is an existing physical directory under the location root (an actual file path that is checked upon the local filesystem will be concatenation of $document_root and $uri strings). Later, during the NGX_HTTP_CONTENT_PHASE, an index directive can cause an internal redirect if an index file is present inside this directory. A quote from the index directive documentation:

It should be noted that using an index file causes an internal redirect, and the request can be processed in a different location. For example, with the following configuration:

location = / {
    index index.html;
}
location / {
    ...
}

a / request will actually be processed in the second location as /index.html.

Summarizing all this up, if you have a configuration like

location / {
    index index.php;
    try_files $uri $uri/ /index.php$is_args_args;
    add_header X-Content 'static';

}
location ~ \.php$ {
    ...
    fastcgi_pass ... # fastcgi_module handler used here for content generation
}
  • every request for the PHP file will go straight to the PHP handler due to the regex pattern \.php$ match, since the regex locations have a greater priority over the prefix ones (unless a prefix location declared using ^~ modifier);
  • every request for the non-existing file or directory will go to the PHP handler due to the internal redirect to /index.php issued by try_files directive;
  • every request for the existing directory containing index.php file will go to the PHP handler due to the internal redirect to $uri/index.php issued by index directive.

That is, only the existing non-PHP static file will ever have a chance to get that X-Content: static header in HTTP response.

Configuration examples for web applications

PHP-driven applications

To add an expiration date for every static file, you can use the following configuration:

location / {
    index index.php;
    expires 30d;
    try_files $uri $uri/ /index.php$is_args$args;
}
location ~ \.php$ {
    ... php-fpm configuration
}

As already being said, this won't affect any PHP file or "virtual" application route in any way.

Applications driven by JavaScript frameworks (Angular/React/Vue/etc.)

That kind of web applications usually being served using a configuration similar to the following one:

location / {
    try_files $uri /index.html;
}

It is the index.html file that's taking the role of a "route controller" here. To exclude it, as well as any "virtual" application route, from the caching policy you can split that location in two:

location / {
    expires 30d;
    try_files $uri /index.html;
}
location = /index.html {
    try_files $uri =404;
}

Additionally, you can define custom cache policies for different assets directories, for example:

location / {
    expires 30d;
    try_files $uri /index.html;
}
location /assets/ {
    expires 90d;
    try_files $uri =404;
}
location = /index.html {
    try_files $uri =404;
}

However when it comes to JavaScript-driven web applications, you may want to exclude javascript files from caching too, and here we go to our next and the last part of the answer.

Different cache policies for the different file types

Ok, you said, that's all very interesting, but I need a different cache policies for different file types, and there are static files that I don't want to be cached at all, e.g. javascript ones. That means I still have to use regex locations for those file types, right?

No, most probably not.

It is very sad that there are so many examples all over the Internet, including those from respected sources, like deployment recommendations from official web applications documentation, that use this approach. There is another, much more efficient way to do the same.

You can evaluate your caching policy (as well as many other location settings) from the MIME type value that will be sent by nginx as the Content-Type response HTTP header. That value will be taken from the mime.types file included in your main nginx.conf configuration file (usually located at /etc/nginx directory) and is available to you via the $sent_http_content_type nginx internal variable. Let's take the nginx deployment configuration recommended by Joomla documentation and make it somewhat more performant. Instead of

server {
    ...
    location / {
        try_files $uri $uri/ /index.php?$args;
    }
    ...
    location ~* \.(ico|pdf|flv)$ {
        expires 1y;
    }

    location ~* \.(js|css|png|jpg|jpeg|gif|swf|xml|txt)$ {
        expires 14d;
    }

}

you can use the map block to evaluate expires directive value (MIME types used here taken from nginx 1.21.4 default mime.types file and may change in the future; check your own mime.types file for the actual values):

map $sent_http_content_type $expires {
    image/x-icon                   1y;
    application/pdf                1y;
    video/x-flv                    1y;
    application/javascript         14d;
    text/css                       14d;
    image/png                      14d;
    image/jpeg                     14d;
    image/gif                      14d;
    application/x-shockwave-flash  14d;
    text/xml                       14d;
    text/plain                     14d;
    default                        off;
}

Then instead of those three locations shown above you can use this single non-regex location:

server {
    ...
    location / {
        expires $expires;
        try_files $uri $uri/ /index.php?$args;
    }
    ...
}

One can ask here, what's the matter about removing 6 configuration lines and adding 15? Is our config only getting bigger? Or even more interesting, hey, I can reduce that map block using regex pattern to include every image type the following way:

map $sent_http_content_type $expires {
    image/x-icon                   1y;
    ~^image/                       14d;
    ...
}

Yes, you can. Don't do it. This isn't about the configuration size, but about performance, which should be your primary goal (at least I think so). When the map table contains only strings, it is became a hash table internally, with the O(1) evaluating time complexity. When you add a regular expressions to it, it gets split in two parts, a hash table of fixed values and a list of regex patterns. If there will be no exact match with the hash table part, the source value will be matched upon all the regex patterns from the list one by one, until the first match is found or the whole list is finished. That is, if performance is really your primary goal, you won't do it (however for this particular case it would be some performance benefit due to replacing two regex matching operations with a single one).

Ivan Shatsky
  • 2,360
  • 2
  • 6
  • 17