0

Not sure if this is possible at all, but maybe there's another way to go about this that will get the same result.

The old situation

I'm running a LAMP server with CentOS 6. This server hosts a LOT of websites that we own. In the file system, we have a folder with all the assets that are common to these websites (images, audio files, etc). We're talking around 400K files and 60 GB of data.

The way the websites access that folder is via a symlink called assets placed in the root of their respective document root (so basically there are dozens of these assets symlinks spread out throughout the file system), so the code of all those websites is full of references to those symlinks in order to load those files in their pages.

Let's say that assets symlink points to /server/path/to/assets. Websites load a file calling, for example, http://mywebsite.com/assets/audio_file.mp3. Internally, the assets symlink loads the file located at /server/path/to/assets/audio_file.mp3.

The new situation

We have decided to move that assets folder off the server, to another location in the internet (AWS S3 if anyone's curious), to save some space in the server and reduce the load.

The issue

Since we're moving the location of those assets, we would need to manually change all the references to those assets in the code of all the websites. That's a LOT of work of course, we're talking thousands of them.

There's the option of using search and replace in the code of all the websites, which I will do if there isn't any other way, but for different reasons I would prefer to avoid that if possible.

The ideal solution

What I had in mind, ideally, would be something as simple as creating a symlink where all the current symlinks point to, and make that point to the URL of our AWS S3 bucket. Something like:

  1. Remove the /server/path/to/assets folder (after having moved all the files from there of course)
  2. Create a new symlink (or whatever other solution would work here) called assets instead of the folder that points to https://my-amazon-s3.com/assets

And then, magically, when a website wants to load http://mywebsite.com/assets/audio_file.mp3, it looks for the file in the AWS S3 bucket and not our server.

I know this might be a bit too much to ask, so I was hoping that maybe there's another way I could solve this problem that someone may suggest.

Albert
  • 119
  • 3
  • *400K files and 60 GB of data* You need to do some serious performance testing of S3 with that many files, especially if you're going to put them all in one bucket. Don't think putting 10 or even a few thousand files gives you an idea of how S3 might (or might not...) scale. – Andrew Henle Dec 31 '20 at 22:32
  • The best option is to update asset URLs in your web sites. The other option is redirect to S3, which adds latency to the requests for the assets. – Tero Kilkanen Dec 31 '20 at 23:07
  • @AndrewHenle that's a good point, although by looking at S3 limitations number of files doesn't seem to be a problem (unlimited), only size of files (up to 5TB). I figure they will take that in consideration. https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html – Albert Jan 01 '21 at 01:03

1 Answers1

1

There is no way to create a symlink to a non-mounted filesystem.

A S3 bucket can be mounted as a local filesystem using s3fs, see https://github.com/s3fs-fuse/s3fs-fuse. Once you do this, it appears as a local filesysem and you could symlink in the normal way.

Note that behind the scenes this is using the S3 API to fetch content to your local server and delivering it from there. Relatively speaking, it is slow. This may make it unsuitable for your web site, but that determination is up to you.

An alternative method that you may wish to consider is to redirect the requests using Apache's RewriteRule functionality. Using a regular expression this is fairly trivial,

RewriteRule ^/assets/(.*)$ https://my-amazon-s3.com/assets/$1 [R=302,L]

If you're using HAProxy then you could achieve a similar result with that.

tater
  • 1,395
  • 2
  • 9
  • 12
  • I like the idea of using RewriteRules, I just wasn't sure how that would work with symlinks, so I can have just one rule in the assets folder. I'll look into this with more detail! – Albert Jan 01 '21 at 01:05
  • `RewriteRule` is above the file system and does not require any actual file to exist. If there is a logical/procedural mapping from old URL to new URL then this is the easiest approach, particularly for assets which are not too latency-sensitive and a redirect is OK. – tater Jan 01 '21 at 01:34