0

Our wordpress web site is several years old and has many posts indexed and ranking well on google. With any serious traffic my wordpress server tanks - and this happens even after several rounds of wordpress optimization. We have had enough of wordpress issues and have decided to migrate.

We are migrating from wordpress to static site for better performance so that pages are not rendered for every request and static html,css,js and image files can be served directly by nginx web server instead of hitting another server at the back end.

The issue is that we have over 400,000 posts and every post will have a static page and hence a static folder in which we will be storing the relevant files like the html and image file for that post. So our main web folder will have over 400,000 subfolders. Will that be an issue on linux? Or will that be an issue for my web server performance? Is there anything on the hosting side that I should care about in this situation?

Has anyone here tried using ext4 with nginx with large number of subfolders in a folder? Does it really affect the performance? There are conflicting reports about performance of ext4 handling large number of folders... We do not want to migrate with added complexity unless it is really necessary. The migration is a big exercise already for us :) and we would like to keep it as simple as possible unless there is a real risk of performance degradation. Has anyone used nginx webserver with large number of subfolders or files in a single folder?

Thank you in advance.

  • Please take a look at this https://stackoverflow.com/questions/466521/how-many-files-can-i-put-in-a-directory/466596 for information regarding number of files per folder sizing limits and perfeormance penalties. – João Alves Jul 20 '20 at 08:45
  • Why not store your content in a database? – Gerard H. Pille Jul 20 '20 at 08:50
  • Maybe you can customise your folder structure for the static site to be something like example.com/year/month/postname or similar, to reduce the impact. Gerard, I think using a database somewhat defeats the purpose of having a static site. Wordpress is very resource hungry and not the easiest to optimise. – Tim Jul 20 '20 at 09:47
  • Instead of moving into static solution, I would first implement caching in nginx. – Tero Kilkanen Jul 20 '20 at 10:19
  • Our urls are already well distributed and ranking high on google - so we dont want to change the url structure. By default nginx serves the page from the same folder name as that in the url - for example an incoming request to nginx on https://example.com/folder1/index.html will look for folder1 in the root folder specified. Can this behavior be changed so that the request goes to /root_folder/2020/07/01/folder1/index.html or any such scheme - so that we dont have to change the incoming urls - just change the location of the file on the server... – Muhammad Ebrahym Jul 20 '20 at 12:26
  • Ideally we would like to use nginx to do the url translation and avoid calling a back end server like node/express to do the url translation because introduction of a back end brings back some of the performance issues that we are trying to mitigate.Does nginx provide for any way to incorporate logic into the conf files so that we can write logic for the url so that it can find files for any url based on certain logic or perhaps some external json? – Muhammad Ebrahym Jul 20 '20 at 12:35
  • The problem is that the URL structure you have was a poor choice. It was probably done for "SEO reasons", but those reasons don't really hold up well. Now you're stuck with it. It might be possible to do something with nginx, but I'll have to think about it for a bit. – Michael Hampton Jul 20 '20 at 13:41
  • As Tero said, I set up caching in Nginx and on CloudFlare, and I created a tutorial about it.However, Wordpress is very resource intensive, and a static site would consume much less resources https://www.photographerstechsupport.com/tutorials/hosting-wordpress-on-aws-tutorial-part-4-wordpress-website-optimization/ – Tim Jul 20 '20 at 18:09

3 Answers3

2

Here is a method of reducing the number of directories as given in Artem S. Tashkinov's answer and configuring nginx to obey the original URL structure.

Create a directory structure for each URL with the first two characters of each URL being a directory under the document root. Place the static content beginning with those two characters under that directory.

The nginx location that makes this possible is pretty simple:

    location ~ /(..) {
            root /srv/www/example.com/$1;
    }

This simply takes the first two characters of the URL after the initial / and appends that to the document root.

Note that this requires everything to be moved into two character subdirectories. That includes the top level /index.html, which must be placed at $root/in/index.html. As another example, a top level URL path /images must be moved to $root/im/images. The original document root will contain nothing but these two-character directory names.

Your document URLs will remain unchanged. For example, a blog post accessible at /15-things-to-do-when-visiting-dubai will be on your filesystem at $root/15/15-things-to-do-when-visiting-dubai/index.html, but still accessible at the original URL. (Note that if your original URLs did not have a trailing slash, one will be added, and 301 redirects are generated for SEO preservation.)

In the end the document root directory will have only a few thousand directories at most, and each of them will probably have at most a few hundred directories or files. This is very easily handled by any Linux filesystem.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • Thank you so much.. This looks workable since there can only be about couple of thousand 2 letter combinations at the most considering hyphens and numbers. My only concern is that we may still end up with many subfolders under some commonly occurring 2 letter characters - but even with such random lumping up we would be much better off than having half a million folders in the root. Is there any way to even out this distribution besides the two lettered approach? Would first byte from a sha1 of the url lead to a more uniform distribution? Would there be any way to implement this in nginx? – Muhammad Ebrahym Jul 20 '20 at 23:06
  • @MuhammadEbrahym Anything you want to do will make nginx do some work to process the request. Do you really want it to try to calculate an SHA hash of every file and then look them up in a database? Some backend app would be better at that, but you said you wanted to avoid such approaches. – Michael Hampton Jul 20 '20 at 23:34
  • true - database lookups are to be avoided - i was hoping for some sort of a pattern matching approach as you suggested but with higher chance of uniformity of distribution. Would nginx be able to hash incoming urls and get the first byte and append it to the url request in the location directive - like the first 2 letters that you suggested? – Muhammad Ebrahym Jul 21 '20 at 07:12
  • Has anyone here tried using ext4 with nginx with large number of subfolders in a folder? Does it really affect the performance? There are conflicting reports about performance of ext4 handling large number of folders... We do not want to migrate with added complexity unless it is really necessary. The migration is a big exercise already for us :) and we would like to keep it as simple as possible unless there is a real risk of performance degradation. – Muhammad Ebrahym Jul 21 '20 at 08:08
  • @MuhammadEbrahym Someone probably has, but I switched to XFS years ago. I already know it is much better at handling large directories. It has been the default for Red Hat Enterprise Linux and CentOS since 2013. If you are using something else like Ubuntu you have to select XFS explicitly at installation time. – Michael Hampton Jul 21 '20 at 13:44
  • so have you had hundreds of thousands of files or folders within a single folder on xfs? was there any performance issue at all? – Muhammad Ebrahym Jul 22 '20 at 10:53
  • Not with about 15,000 files. I haven't really tried anything more than that as I've been able to use solutions like this one above. – Michael Hampton Jul 22 '20 at 10:55
1

In a best case scenario, you should avoid having more than a few thousand files per directory in most file systems because otherwise traversing it will take too much time and resources.

You could create a directory structure such as:

  • 00
  • 01
  • 02

...

  • FE
  • FF

That will give you 256 directories, and you can nest them infinitely.

Or you could try organizing posts by /YYYY/MM/DD/$UID-post-title

kenlukas
  • 2,886
  • 2
  • 14
  • 25
  • if we change the folder structure for file storage then would the urls need to be changed? These urls are ranking high on google and we would not want to change the url for the posts. Is there a way to tell nginx to look up a certain folder for an incoming url request? I m4ean can we keep the url structure same and ask nginx to lookup a json file and serve a static file from a folder location that is different from the url? – Muhammad Ebrahym Jul 20 '20 at 12:20
  • You can always set up redirects. – Artem S. Tashkinov Jul 20 '20 at 12:29
  • Thanks for the reply... is there a way to automate such redirects on nginx based on some json file or some logic like this: If the hash of the folder name in the url starts with 00 then change url to /00/folder name OR some other logic? What logic would you suggest to minimize the number of sub-folders and distribute them evenly? Also will such redirection affect google ranks of our urls - now that the url is being permanently redirected to another url? – Muhammad Ebrahym Jul 20 '20 at 12:42
  • 1
    Please read this post: https://www.tendenci.com/help-files/nginx-redirect-maps/ As to how you organize your directories - it's really up to. – Artem S. Tashkinov Jul 20 '20 at 12:48
0

Since the site is static how about hosting it on AWS S3 and making it AWS's problem?

S3 can host websites directly and each bucket can store a virtually unlimited number of files (which it calls objects) in a bucket. You used to have to be very careful about file naming but that has largely been solved and isn't a big issue now. Read the performance guidelines though, and test well.

S3 isn't always cheap for storage or bandwidth, you should use the AWS calculator to work out your costs (the new calculator doesn't seem to do S3 pricing). You can mitigate traffic costs somewhat by adding caching headers to every object when you upload it to S3, then putting your S3 bucket behind CloudFlare CDN (see this question). CloudFlare have free and paid plans, but with this much traffic and content I expect you'd want a paid plan.

Tim
  • 30,383
  • 6
  • 47
  • 77