I work in the research group of a large company. We do a lot of work on a grid processing system with many nodes (More than 200, I'm not sure exactly how many) and several harddrives. More than 1000TB of data.
Most of this data can be re-produced, but that requires time. A lot of the data is code which is stored in separate RCS repos, which can have their own backup, but working copies are, of course, on the normal user-drives.
Can someone point me at a best-practices document, or something about how most companies go about protecting this much data?
Thanks