What is the best way to synchronize huge data of a running production server?
Our server has over 20 million files (small files with 10k and bigger files up to 50MB) stored in 1 millon directories. The size of all data is about 5 TB (steadily increasing).
Is it possible to synchronize the data with lsyncd and what are the limits (especially of inotify)? How much additional space does lsyncd need? What about the load (cpu and memory) and the live time?
Another solution would be GlusterFS. Is it possible to use GlusterFS on a production with none or minimal downtime? GlusterFS stores a lot of magic data in x-attributes files and the storage volume is about 15 to 20% bigger than systems with non GlusterFS. Seems like a huge amount of waste...? What about the load?
And at least rsync and cronjobs could do the job. rsync would only run on the slave... So no additional space is needed on the primary server, but rsync must read the full directory tree every time the cron runs...