Sharing filesystem between linux hosts?

3

I'm running a website that uses user-uploaded files heavily. Those files are served to users only after a permission check by a Django application.

Now I need to scale horizontally and spin up another instance of a web server. It needs to have access to the same directory structure that contains uploaded files. What I likely need is some distributed filesystem.

I've been thinking about:

  1. NFS — done that 15 years ago and even then I felt that standard was quite outdated. Although robust and easy to set up, the lack of transfer encryption and the need of syncing UIDs/GIDs between servers creates more problems than it solves.
  2. periodic rsync — sounds like a dirty hack and would probably lead to out-of-sync problems. And take N times the storage size for N servers.
  3. sshfs — well, if it has the same performance as scp, I don't want to hear about it actually.
  4. LustreFS , Gluster, or other DFS — never used those and have no idea which would suit my needs. Redundancy is not critical (we have frequent backups) but I'd like to have the traffic between servers encrypted.

What would you recommend?

emesik

Posted 2019-04-19T19:39:02.590

Reputation: 131

Re: NFS encryption - have you considered using a VPN between the nodes? If you don't want to go as heavy as an IPSec VPN, WireGuard would be a good alternative that may meet your performance needs.

– AfroThundr – 2019-04-19T20:49:54.113

Thanks, @AfroThundr I'm reading about tinc Do you know how they compare?

– emesik – 2019-04-19T20:59:08.103

I've never used Tinc personally, but from several other discussions on HackerNews, it would seem that Tinc also incorporates a mesh networking and self-healing capability, but lags behind WireGuard in straight performance. They're both still fairly lightweight though, compared to Strongswan or OpenVPN.

– AfroThundr – 2019-04-19T21:20:23.903

Answers

0

It really depends how far you want to br able to scale, how complex a solution you are looking for and how apart (ping wise) your servers are.

NFS is likely the best tool for the job - you can couple it with something like puppet/chef/cfengine etc to ensure directories are.in sync. You can use OpenVPN or equivalent for in-flight data. Most NAS systems would do it this way - and if built on top of RAID with decent NICS this is a relatively simple, robust and well understood model.

Other considerations -

Gluster can be a decent solution, but not as well tested. I played.arround with it, but was never truly comfortable.

If you only need 2 (or maybe 3) nodes on a directly connected network, look at DRBD (a mode). If its longer distance and primary/fallback you could use MARS (which is a bit like drbd-proxy)

You might also want to check and ensure ZFS is.not the tool for you (eg using ZFS+replication) - I suspect this wont provide dual write though, and I never had much luck with it.

Depending on your particular.use case, if you are.playing at the VM level you can look at iSCSI, but probably not what you want.

Depending on the content it may be possible to stick it all in a database and use database replication. You might be able to modify your software or use a FUSE mounted db filesystem.

davidgo

Posted 2019-04-19T19:39:02.590

Reputation: 49 152

Thanks. Finally I decided to explore Amazon S3 as the storage for our files, and relieve ourselves from additional server maintenance overhead. – emesik – 2019-04-24T09:08:10.030