43

We have a setup with a few web servers being load-balanced. We want to have some sort of network shared storage that all of the web servers can access. It will be used as a place to store files uploaded by users. Everything is running Linux.

Should we use NFS, CIFS, SMB, fuse+sftp, fuse+ftp? There are so many choices out there for network file sharing protocols, it's very hard to pick one. We basically just want to permanently mount this one share on multiple machines. Security features are less of a concern because it won't be network accessible from anywhere other than the servers mounting it. We just want it to work reliably and quickly.

Which one should we use?

Apreche
  • 1,405
  • 4
  • 17
  • 20
  • Life is a lot simpler if you add an accelerator in front of your website, e.g. squid accelerator or cloudflare. Next best thing is to write changed content to memcache or database instead of files. Shared directories is not for larger sites. – Antti Rytsölä Nov 10 '15 at 13:14

17 Answers17

31

I vote for NFS.

NFSv4.1 added the Parallel NFS pNFS capability, which makes parallel data access possible. I am wondering what kind of clients are using the storage if only Unix-like then I would go for NFS based on the performance figures.

Istvan
  • 2,562
  • 3
  • 20
  • 28
23

The short answer is use NFS. According to this shootout and my own experience, it's faster.

But, you've got more options! You should consider a cluster FS like GFS, which is a filesystem multiple computers can access at once. Basically, you share a block device via iSCSI which is a GFS filesystem. All clients (initiators in iSCSI parlance) can read and write to it. Redhat has a whitepaper . You can also use oracle's cluster FS OCFS to manage the same thing.

The redhat paper does a good job listing the pros and cons of a cluster FS vs NFS. Basically if you want a lot of room to scale, GFS is probably worth the effort. Also, the GFS example uses a Fibre Channel SAN as an example, but that could just as easily be a RAID, DAS, or iSCSI SAN.

Lastly, make sure to look into Jumbo Frames, and if data integrity is critical, use CRC32 checksumming if you use iSCSI with Jumbo Frames.

Andrew Cholakian
  • 876
  • 1
  • 6
  • 12
20

We have a 2 server load-blanacing web cluster.We have tried the following methods for syncing content between the servers:

  • Local drives on each server synced with RSYNC every 10 minutes
  • A central CIFS (SAMBA) share to both servers
  • A central NFS share to both servers
  • A shared SAN drive running OCFS2 mounted both servers

The RSYNC solution was the simplest, but it took 10 minutes for changes to show up and RSYNC put so much load on the servers we had to throttle it with custom script to pause it every second. We were also limited to only writing to the source drive.

The fastest performing shared drive was the OCFS2 clustered drive right up until it went insane and crashed the cluster. We have not been able to maintain stability with OCFS2. As soon as more than one server accesses the same files, load climbs through the roof and servers start rebooting. This may be a training failure on our part.

The next best was NFS. It has been extremely stable and fault tolerant. This is our current setup.

SMB (CIFS) had some locking problems. In particular changes to files on the SMB server were not being seen by the web servers. SMB also tended to hang when failing over the SMB server

Our conclusion was that OCFS2 has the most potential but requires a LOT of analysis before using it in production. If you want something straight-forward and reliable, I would recommend an NFS server cluster with Heartbeat for failover.

Mark Porter
  • 991
  • 1
  • 5
  • 12
5

I suggest You POHMELFS - it's created by russian programmer Evgeniy Polyakov and it's really, really fast.

3

In terms of reliability and security, probably CIFS (aka Samba) but NFS "seems" much more lightweight, and with careful configuration, it's possible not to completely expose your valuable data to every other machine on the network ;-)

No insult to the FUSE stuff, but it still seems...fresh, if you know what I mean. I don't know if I trust it yet, but that could just be me being an old fogey, but old fogeyism is sometimes warranted when it comes to valuable enterprise data.

If you want to permanently mount one share on multiple machines, and you can play along with some of the weirdness (mostly UID/GID issues), then use NFS. I use it, and have for many years.

Matt Simmons
  • 20,218
  • 10
  • 67
  • 114
  • 3
    FUSE itself isn't that new, so I'd trust it, but some of the filesystems built on it *are* new and definitely warrant some healthy skepticism. Which translates, in the real world, to increased testing at the very least :) – pjz May 29 '09 at 14:10
3

If you've already got webservers everywhere and are good at running them, why not consider WebDAV?

pjz
  • 10,497
  • 1
  • 31
  • 40
2

I would advise against NFS. Simply put - we had a web server farm, with JBoss, Apache, Tomcat and Oracle all using NFS shares for common configuration files, and logging.

When the NFS share disappeared (admittedly a rare-ish occurrence) the whole thing just collapsed (predictable really, and I advised the 'devlopers' against this config time shortcut).

There seems to be an issue with the version of NFS we were using that, if the target disappeared during a write, the client would drop into a never ending wait loop, waiting for the NFS target to come back. Even if the NFS box reattached - the loop still did not end.

We were using a mix of RHEL 3,4,5. Storage was on RHEL4, servers were on RHEL5, storage network was a separate lan, and not running on vlans.

If there is a load balanced front end, checking single storage - would this not bottleneck your system?

Have you considered a read-only iSCSI connection to your storage, with an event driven script to move the uploaded file to the storage via ftp/scp when a file is uploaded?

The only time I have implemented a successful centralised storage for multiple read heads was on an EMC storage array... All other cost effective attempts had their drawbacks.

Iain
  • 363
  • 1
  • 4
2

NFS. It's tried and true, and you can have a rock solid setup. GFS performance is generally awful, especially on filesystems with large numbers of small files. I haven't used OCFS, but I generally frown on the cluster filesystem concept. Then there's Lustre, but that's another can of worms...

2

You'd be out of your mind to consider a distributed FS like GFS, and iSCSI is overkill.

If you want simple go with NFS. It's simple and fast, and with soft mounts fairly robust. Also consider disabling all the locking junk that goes with it. I have Linux desktops that grab all their home directory and applications from NFS, it works fine.

If you want outrageous speed go with Lustre, which is significantly easier than GFS to set up and is a lot like RAID NFS. We use Lustre for our clusters.

Jim Zajkowski
  • 1,604
  • 12
  • 11
1

Simple answer +1 for NFS. I have NFS shares that have been mounted for years at a stretch without issue.

If you're looking for super reliability then consider throwing DRBD into the mix as well for a distributed, auto failover NFS filesystem.

The only other option (that I'm familiar with) is iSCSI but it can be a pain in the rear to configure...

Rob Dudley
  • 51
  • 6
1

Considered GFS? GFS is a cluster filesystem, and in my experience, is pretty reliable. It can have more than one journal, it scales pretty well

But you would need to install some cluster services and GFS isn't exactly know for its speediness. Otoh, it has always been fast enough for me, but ymmv.

wzzrd
  • 10,269
  • 2
  • 32
  • 47
1

I would echo the warning some have given against NFS - although NFS is probably your best bet (strange as that sounds).

I had an NFS client that I had to disconnect from AC to shut off because the NFS server had disappeared and the client refused (in the kernel) to unlock or shutdown because the NFS server was gone.

To do it right, I would insist on NFSv4 throughout, stick with TCP connections, use jumbo frames, and use an NFS cluster. You can't afford to have your NFS server disappear.

Mei
  • 4,560
  • 8
  • 44
  • 53
1

I might be a little late, We use a MD3220 Dell Storage that has cluster dual port, Our unit has 2 controller incase one goes down second will keep it up running till we fix that issue. Since HDD, FAN, Power supply and Controller are all Hotswap we replace parts in and out. As of Format we use NFS.

0

You have a bunch of options, with a variety of costs. Shared SAN with FC, iSCSI or one of the more recent additions. In any case they can be expensive to set up and you still need to run a cluster aware file system. Clustered filesystems are a world of pain. For any hope of success you need separate hi speed, low latency networks for cluster communication and data. Even with that you are likely to get glitches that result in a node being ring fenced and killed.

The only cluster file system I've come across that just works without hassle is VMFS. But that is so specialised, it would be no use even if it was available for general use.

NFS is probably the way to go for your setup. If you're worried about resilence you need to get a proper clustered NFS box. You can do a homebrew setup, but would hit the above problem. Best bet (if you have the money), is clustered NetApp filers. It's an expensive option, but the clustering actually works without any hassle. Not only that, they are very fast.

goo
  • 2,838
  • 18
  • 15
0

GFS is some seriously black voodoo. The amount of work required to get a simple two client cluster working is staggering compared to the alternatives. OCFS2 is a lot simpler to deploy but is very picky when it comes to the kernel module versions involved on all attached servers - and that's just the beginning.

Unless you really need the kind of low-level access a cluster filesystem offers, NFS or CIFS is probably all you need.

allaryin
  • 323
  • 4
  • 10
0

On a large server farm we had several million user created html pages. NFS did not work so well so we ended up putting them in a mysql table. The overhead compared to traversing a directory tree was about the same.

-2

I've used SFTP and it worked fine for my purposes - NFS was my first resort but the funkiness of the user/group IDs made me drop it rather quickly.

Just setup publickey auth and you'll largely be set. There might be a somewhat heavier CPU overhead for the SSH encryption, but on the plus side I've never run into any issues with data corruption.

FTP could well suit your purposes though, since it's more lightweight. Presumably you want your webservers to be doing the web serving, not the ssh work.

cflee
  • 55
  • 2