I have a large a set of data (+100 GB) which can be stored into files. Most of the files would be in the 5k-50k range (80%), then 50k - 500k (15%) and >500k (5%). The maximum expected size of a file is 50 MB. If necessary, large files can be split into smaller pieces. Files can be organized in a directory structure too.
If some data must be modified, my application make a copy, modifies it and if successful, flags it as the latest version. Then, old version is removed. It is crash safe (so to speak).
I need to implement a failover system to keep this data available. One solution is to use a Master-Slave database system, but these are fragile and force a dependency on the database technology.
I am no sysadmin, but I read about the rsync instruction. It looks very interesting. I am wondering if setting some failover nodes and use rsync from my master is a responsible option. Has anyone tried this before successfully?
i) If yes, should I split my large files? Is rsync smart/efficient at detecting which files to copy/delete? Should I implement a specific directory structure to make this system efficient?
ii) If the master crashes and a slave takes over for an hour (for example), is making the master up-to-date again as simple as running rsync the other way round (slave to master)?
iii) Bonus question: Is there any possibility of implementing multi-master systems with rsync? Or is only master slave possible?
I am looking for advice, tips, experience, etc... Thanks !!!