5

I'm running a small home server that stores my documents. The disks in this server are in a RAID 1 configuration (using Linux md) and it's also periodically being backed up to an external hard drive to make sure I don't lose them. However, I'm always accessing the files from other computers on the home network using an SMB share, and this results in a considerable speed penalty (especially when connected over WLAN). This is quite annoying when editing large files, such as digital camera RAWs, for example.

I've been looking for a solution to this problem. It would have to offer some kind of local caching to speed up the file access. The client would preferably not keep a copy of all data on the server, as it consists of a very large collection of photographs, most of which I will not access frequently. Instead, it should only cache the accessed files and sync the changes back in the background. Ideally, it would also do some smart read-ahead (cache the files that are in the same directory as the currently opened file, for examples), but I suppose that's asking a bit much. Synchronization should be automatic (on file change). Conflicting file changes (at the same time on different clients) are unlikely to happen in my use case, but I would prefer if they are handled properly (notification to the user).

I've come across the following options, so far:

  • something similar to Dropbox. iFolder seems to be the only thing that comes close, but its reputation (stability) and requirements put me off.

  • A distributed file system such as OpenAFS. I'm not sure this will speed up file access. It is probably overkill for what I need.

  • Maybe NFS or even Samba offer these possibilities. I read a bit about Windows' Offline Files, but its operation seems limited (at least on Windows XP).

As this is just for personal use, I'm not willing to spend a lot of money. A free solution would be preferred. Also, the server needs to run on Linux, and I need a client for at least Windows.

Edit: I have since set up OpenAFS (a quite complex process). It seems to cater for most of my needs. Files are cached locally, speeding up access to cached files. First access to a file is still slow, of course. I'm looking forward to Disconnected AFS planned for a future OpenAFS release as this will allow pre-caching of files. This would be perfect for editing sets of large files such as camera RAWs.

2 Answers2

2

You can cache reads and writes for NFSv4 with FS-Cache. Red Hat has written good documentation.

FS-Cache is a persistent local cache that can be used by file systems to take data retrieved from over the network and cache it on local disk. This helps minimize network traffic for users accessing data from a file system mounted over the network (for example, NFS).

sciurus
  • 12,493
  • 2
  • 30
  • 49
  • This sounds good. Is there a (free) NFS4 client for Windows XP that supports caching? – Brecht Machiels Mar 16 '11 at 09:59
  • Sorry, I missed the bit about Windows. No, FS-Cache is a linux-only technology. I also have no idea about support for any version of NFS on Windows. – sciurus Mar 16 '11 at 19:56
  • Thanks for the link to good documentation. I've been searching some time ago and there was no. – rvs May 11 '11 at 11:14
0

The problem with filesystems which work in disconnected mode is 1) concurrent access and 2) synchronization. If you don't need concurrent access, just get a removable hard drive and carry it around. If you do need concurrent access, then you will have problems with synchronization, because how does a client working with a local copy tell other clients that the copy on the network is now out-of-date? If you think about it, the closer to synchronization you get, the less point in having a cache. So, the two extremes are what usually win; you can either get a portable drive or use a filesystem which doesn't cache.

For your photographs, what you probably actually want is a local index so that you can search locally in the index, but fetch the actual files from the network. This is something which would operate at a higher level than the filesystem itself, so the reason you haven't found what you're looking for is that you're looking for the wrong thing. :) You probably should be looking for image indexing programs; this is a problem which has been solved in several ways, as I recall.

For a more general solution, something inspired by a Subversion or CVS or similar would probably work fairly well.

dannysauer
  • 752
  • 4
  • 8
  • 1
    I don't need concurrent access, but a using removable hard-drive will not offer the redundancy I have now (RAID 1) and require extra effort making backups. This is not an option. I'm not sure how a local image index will speed up actual file access. As for a using a VCS, I can't imagine storing photographs on a VCS, as the repository will grow in size uncontrollably. Also, this requires manually checking in changes, making the whole thing tiresome. – Brecht Machiels Mar 16 '11 at 09:46
  • The local index provides you with a reduced-size data set to search within, while the full images are stored on the central location. So you can find what you need on fast local storage (in an index optimized for searching), rather than having to slowly search a remote filesystem. – dannysauer Apr 22 '11 at 15:36
  • With regards to the VCS, using a WebDAV-based system allows you to treat the repository as a local folder on Windows using web folders, on Linux using a gnome VFS, and probably on most other platforms. So there is no manual check-in; you just save. Subversion is one WebDAV server implementation, but there are others, and they don't neccesarily have to actually support versioning (early versions of mod_dav did not, in fact, support deltaV). – dannysauer Apr 22 '11 at 15:46
  • ...and, one more. :) Regarding the removable drive, why do you need redundency and backups? This is a serious; often people don't realize the difference. For a backed-up RAID-1, about the only real gain is seek performance, which doesn't matter much in an application where you're opening a couple of big files over a network link. If you use an automated backup system such as BackupPC which identified duplicated files and only stores them once, then as long as the removable drive is always connected to at least one machine whichis automatically backed-up, you're set with 0 extra effort. – dannysauer Apr 22 '11 at 15:51
  • Of course, you can do what you want; I'm just making sure you're aware of the possibilities. :D – dannysauer Apr 22 '11 at 15:55
  • With RAID 1, a disk failure has a minimal effect on the server's availability and you lose no data at all (with periodic backups, the last changes are lost). The periodic backups are to protect the data against theft and fire. I keep the backup drive at work and bring it home every two weeks for updating the backup (using rsnapshot/rsync). – Brecht Machiels May 11 '11 at 10:29