4

I have a few file upload websites, with files ranging from hundreds of kilobytes to a few gigabytes.

Currently I have all files in a distribute-replicate Gluster volume on a few servers.

My biggest problem with Gluster is speed.

For example I have a folder with ~80,000 images averaging 500KB each and it took me a couple of hours to change the images' owner.

For the moment everything is pretty decent, but I'm worried about having much more files and the time it will take to work with them.

What alternatives do I have? Am I doing anything wrong with Gluster?

This is my gluster configuration:

performance.cache-size: 1GB
performance.cache-refresh-timeout: 60
performance.cache-max-file-size: 100KB
cluster.choose-local: true
performance.readdir-ahead: on
performance.io-thread-count: 16
client.event-threads: 3
server.event-threads: 3
Alex Dumitru
  • 315
  • 2
  • 3
  • 8

2 Answers2

3

I have used lsyncd program in a similar situation where I need to synchronize may servers' content. Internally it uses rsync to synchronize files between servers.

However, the drawback is that you need to direct uploads to a single server, and synchronise files from that server to all servers.

Tero Kilkanen
  • 34,499
  • 3
  • 38
  • 58
1

Any distributed filesystem will suffer when executing batch operations on large number of files: after all, it had to propagate all changes to a remote machine, and latency skyrocket compared to local host only. This can be especially noticeable when executing metadata-changing operations: not touching real data, they are very fast locally; however, remote replication will be totally latency-bound.

You have basically two solutions:

  • use a file-sharing approach, uploading your files on a specific box and exporting them via NFS. While NFS is not a silver bullet (and it is not a speed monster), when coupled with client-side caching it can perform adequately
  • use an asynchronous file-replication system as lsyncd (or something based on rsync)
shodanshok
  • 44,038
  • 6
  • 98
  • 162