Answers for Nick - Keep in mind this methodology is for low-cost small business use, purchasing name brand pre-built systems for workstations. It's a scenario to make use of the extra wasted resources available. We use all available resources. When users leave for the day their workstations are rebooted into the cluster for automated build and testing. The backup method I put forth is a way to utilize the extra space in each workstation using multiple machines for redundant copies.
...Joe, what do you mean with live system? The production servers?
Yes. Raid is for reduced time loss. Therefore it should be used on a 24/7 running system. It has much less value for a backup system that only needs to be running during the backup data transfer or workstations that only "need" to be on during the day.
...So in the option you describe the plan is: journaling in each workstation the public data (encrypted).
Yes. It could be public shared or cross-workstation. Journal/snapshot the changes hourly on the raid system between backup transfers to another medium which is usually twice a day, noon and nightly. (Keep as much journaled backup as possible on the production system up to 80% of disk space. After this performance may take a hit.) This way users can easily recover overwritten or deleted files without talking to a sysadmin by going to their /username/date/time folder on the raid production system and use standard diff tools, have access to all available snapshots of the day, etc..
Encryption is in case a workstation is stolen and/or to protect against "prying eyes". We have good developers so you trust them to not try and decrypt. They can do damage to the business in many other ways, trust is required.
...Those snapshots goes to the system with 5 external disks daily or take the daily off-site in one of the 5 disks?
Traveling data is always on tape. Tape survives shock. Disk is faster for seeking, that's why we prefer disks as the "journal" backup. Tapes are full or incremental backups usually with no journals/snapshots. Most data recovery will be done during the day - for our user base. "I need the file the way it was before lunch." "I just deleted the wrong file."
The granularity of restores from previous days are usually sufficient with one version per day. If more journaling is needed the backup is adjusted or a revision control system is implemented and the revision tree is backed up.
The five disks is an arbitrary number to show the relative cost against a tape only system. Five separate disks with copies of the same data have much higher redundancy than any small business raid system. If the workstations have adequate space, one dedicated backup disk may be sufficient. (Given that multiple copies are on workstations and tape)
At a set point in time data is transfered off the production servers journaled backup partition and moved to a backup system with external drive(s) connected making 2-5 copies, one on internal disk, one on external disk and to tape. The workstations are backed-up to the backup systems then receive a copy of the shared production system's backup before shutting down each workstation. At no time are there less than three physical copies of back-ed up data. The 3 copies, 5 copies, etc. is a redundancy question that needs to be modeled for each business and each type of data. You might want 5 copies of invoices, 7 copies of contracts, only 2 copies of a standard graphic and a single copy of the current test build executables, etc..
...Also, the snapshots in each workstation are equal? or they all sum up the complete public data?
Either. Depends on available space and needs. Our purchased systems always come with disks much larger than needed for the average user (developers may make use of extra space but the receptionist has no need for a 500gb+ disk)
...What do you think of those external storage hub like linksysbycisco.com/US/en/…?
Don't know. We prefer machines that can be put to another use, backup server today, someone's workstation tomorrow, offload copies of virtuals during a major upgrade for quick failover, etc.. That's one of the reasons for the external disk - to keep all workstations as similar as possible. Therefore the "backup server" will have the same 500gb+ disk that every workstation has. It's the same physical machine, purchased in sets, so over time there will be differences in CPU, memory and disk based on the deal du jour. Machines are allocated based on performance needs and swapping a new machine to increase memory takes less overall sysadmin time than installing a memory chip in a perfectly running machine. If we keep CPU and video (AMD64, Nvidia) relatively consistent machine swaps are painless.
The production server uses two raid cards one running 10k rpm scsi and another running 7200rpm scsi drives for maximum performance. A $60 SATA terabyte drive used for backup holds as much as thousands of dollars worth of scsi drives, raid controllers, hot swap rack case, etc.. Development servers are usually adequate with SATA raid, more space but less performance. Since there are less simultaneous users the performance difference is usually negligible.
In simple terms -
- Production system - active shared data and OS on raid "primary data partition"
- Production system - hourly journaled snapshots since the last backup on raid "backup data partition"
- Workstation system - active data and OS on non-raid "primary data partition"
- Workstation system - backup data on non-raid "backup data partition"
Average workstations purchased with 500gb+ drives and use ~40gb max for multi boot windows/linux/bsd/opensolaris partitions. The rest is the backup partition which contains backup copies of each others workstation OS's, production server's OS backup, production servers journaled data backups and/or productions servers incremental data backups.
If any two machines dies in the building recovery takes minutes. There are at least three physical copies on site of each OS and usually we have enough unused workstation + external drive space to keep a week or two of incremental backups from the production server and at least two copies of the last full backup.
We can lose the raid system, the tape and two workstations and not lose any data and be up and running within minutes. (albeit without the raid until it's repaired) But the data is accessible "instantly". This has saved hours of time during a failure which always seems to happen at the worst possible business time. Power supplies will invariably fail right before an important sales meeting/demo. Raid systems always seem to fail in the morning never on a Friday evening so you can fix them and be back up by Monday morning.
The docs describing the backup process are company property. I'll try and re-write for public viewing with diagrams and use cases. I've used this general methodology for many years now and it has saved time and data when the standard tape only systems fail. I've seen failures on IBM, Compaq, HP and Dell systems using DLT, LTO, etc. A common failure is no errors during the backup but when you try to restore the data is corrupted. Always test restore. That's one of the reasons why we use an online journal backup which can easily be tested daily. Since the users get used to it we never have gone more than a week without someone using the journaled backups and almost never use the tapes. The tapes are in case the building burns down.