We have a NAS server at the company I work for that is being used for storing photography sessions. Each session is approximately 100gb. Over the last couple of years this server has accumulated 10+ TB of data, and we are increasing the amount of photoshoots exponentially. I estimate that by the end of next year we will have 20+ TB stored on this NAS. We are currently backing this server up to tape using LTO-5 tapes with Symantec BackupExec. Since the size of this server has grown, full backups of this server are not completing overnight. Does anyone have any suggestion on how to backup this amount of data? Should we be backing it up to tape? Are there any other options which may be better?
-
36Why are you performing Full backups every night? Why not run a Full backup once a week and run Incremental backups the remaining 6 days a week? – joeqwerty Dec 12 '12 at 03:52
-
@joeqwerty - took the words out of my mouth – Mark Henderson Dec 12 '12 at 04:11
-
9That is what we are doing, sorry I did not mention that... the weekly full is the one not completing. – Jesus Fidalgo Dec 12 '12 at 04:35
-
6Does a weekly full need to complete overnight? It is not uncommon for weeklies to take more then 24 hours for a sufficiently large dataset. – Stefan Lasiewski Dec 12 '12 at 06:48
-
2What type of NAS are you using? – ewwhite Dec 12 '12 at 10:16
-
Also consider doing a differential instead of an incremental backup. You'll need more storage, but restores will be easier, faster, and less error-prone. – Don Branson Dec 12 '12 at 12:17
-
I've never seen the value in Differential backups in relation to backing up user data. Why backup the same user data if it hasn't changed since the last Full or Incremental backup? It may take more time to restore a complete set of user data, but how often does that really occur? Most users want only a single file or a few files restored. The cost savings (time and backup media) of Incremental backups outweighs any convenience that a Differential backup provides IMO. – joeqwerty Dec 12 '12 at 13:26
-
6Are you sure the increase in photoshoots is *exponential*? – gerrit Dec 12 '12 at 17:38
-
Have you looked into Amazon S3? And just not worry about any of that. – Michael Ozeryansky Dec 13 '12 at 14:05
-
As a suggestion, try storing the originals with lossless photo compression. That will save you some space. – Bigbio2002 Dec 14 '12 at 20:43
8 Answers
You need to take a step back and stop thinking "I've got 20TB on my NAS I need to back up!" and develop a storage strategy that takes into account the nature of your data:
- Where is it coming from and how much new data are you getting? (you've got this in your question)
- How is the data used once you have it? Are people editing the pictures? Do you keep the originals and generate edited versions?
- How long do you need to keep all the data? Are people still making changes to pictures from 2 years ago?
Depending on the answers to the last two questions, you probably need more of a Archiving System than a radically different backup system.
Data that is static (e.g. 2 year old pictures that you retain "just in case") doesn't need to be backed up every night, or even every week, it needs to be archived. What you actually do might be more complex, but conceptually, all the old pictures can be written off to tape (multiple copies!) and not backed up any more.
Based on your comments, some additional thoughts:
Since you keep the originals of each shoot untouched and work on a copy, and assuming that at least some of the original pictures are duds, you might be able to cut the amount of data that needs to be backed up in half.
If you still can't finish a full backup within whatever window of time you have, a common way to speed things up is to do a disk-to-disk backup first and then later copy the backup set off to tape.
- 12,788
- 28
- 44
- 59
-
1The original shoot is stored untouched, then another copy of the photoshoot is used for editing. The data may need to be kept about 2 years. – Jesus Fidalgo Dec 12 '12 at 04:40
-
20+1 Well said. I'm surprised how the difference between Backup and Archive is, in general, poorly understood. I do full and incremental backups of my system and ephemeral data such as email & documents, but archive my photography (1.2TB and growing :-). Wish I could give another +1 for the disk-to-disk suggestion as well. – Ex Umbris Dec 12 '12 at 06:30
-
8+1 I'd bet that 80% of the data on the NAS is never used more then once. – Stefan Lasiewski Dec 12 '12 at 06:49
-
+1 The best option here is to do daily and even hourly disk to disk delta transfers to capture changes and then ship the full or incremental backups off to an archive or off-site provider/location on a weekly or semi-weekly basis. We used to take delta backups of our SQL files every 15 minutes to reduce the amount of data loss in a DR scenario. – Brent Pabst Dec 12 '12 at 13:53
You have two options:
Option 1:
- Buy another NAS
- Give your users RO access to the new_NAS
- Move all files older than 2 years to new_NAS
- Keep backing up old_NAS as usual
- Every 6 months move files older than 2 years to new_NAS
Option 2:
Buy another NAS
Run
rsync
every hour: old_NAS -> new_NASor, better use something like rdiff-backup which does rsync + keeps deltas with file changes (you can restore older versions of the files)
rdiff-backup user1@old_NAS::/source-dir user2@new_NAS::/dest-dir
Every 6 months clean old files running something like:
rdiff-backup --remove-older-than 2Y old_NAS::/dest-dir
Why do your backups have to complete overnight? Fileserver performance? You might be able to constrain the bandwidth of your backup software to limit impact during the day. Or dedicate an interface on your NAS to talk to the tape drive to limit impact on other traffic.
Can you run full dumps on weekends and only do incrementals during the week? If the problem is changing tapes on the weekend when no one is around, a cheap tape library/autochanger costs a lot less than paying someone to change tapes.
Can you segment your data into multiple groups that are small enough to complete within your backup window?
We have about 50TB of data on a our NAS and it takes over a week to get a full dump of the entire thing using 2 tape drives (one volume takes nearly a week itself because it contains many tiny files). What we do is replicate our data to a second NAS. Our secondary NAS is on-site (but in a different datacenter from the primary), so we still spool data off to tape for off-site backup. We run backups from that secondary NAS so backups don't slow anyone down.
If you can colocate your secondary NAS far enough away, then it can be your backup, no tapes needed.
- 337
- 1
- 8
I'm just in doubt about the size of each shooting session, is it really 100gb / session? How many sessions does your company do each month?
Since you're mostly storing old sessions that won't be used frequently, etc, and probably don't need to recover that information that frequently, I would suggest you to use the services from some company to take care of that task for you.
Just for example, storing those 20TB using an online service like Amazon Glacier would cost a bit more than $200/month. If you need to retrieve those archives frequently, or even recover then in full, it would hit some time / cost constraint. If you just store those things "to be sure they are stored", perhaps using a third-part could make your life easier (and even cheaper than buying another NAS, tapes, etc)
- 166
- 2
- 9
-
1100 GB per session sounds a little high to me, but not unreasonable. We commonly had 32+ GB session where I used to work, and our equipment was medium-tier. – Tom Marthenal Dec 13 '12 at 11:19
full backups of this server are not completing overnight
Then try incremental backups? One full backup every xx days, incremental the rest.
Harddisks are inexpensive, faster than tapes and can be used for backup.
Also there are good alternatives for cloud backups now so its not required to keep adding more and faster tapes.
For example:
- 654
- 4
- 8
- 17
-
Look at the comments - it's the weekly fulls that are not completing. Additionally, cloud backups for 20TB of data... not a good idea. The "cheap" option of Amazon Glacier will cost ~2500/yr, and retrieving all that data will cost ~$36,000. – HopelessN00b Dec 12 '12 at 17:01
-
-
1I guess its a matter of opinion if $2400/yr is a lot for 20TB relatively safe and fully maintenance free storage. No power consumption, no cooling, no failing hardware, no SLA, doesn't take up rack space. And as with most systems you should expect around 0 full recovery operations. And if you need a recovery the price is more like $1800 than $36000 (not sure where you got that number from). – Tedd Hansen Dec 13 '12 at 11:32
-
For glacier, the $36K is pretty close. I roughly calculate it as $42K for retrieval costs on 20TB. It's still not alot though. The bandwidth is more of an issue. – Sirex Dec 16 '12 at 23:08
I think the best solution for this is what we do with our payroll data, which should take a minimal effort for you to implement.
Initially, it's kept with the rest of the server data that's backed up daily. Our retention period on those backups is 13 months.
Once we no longer expect that the data will need to be modified, (two pay periods later, IIRC) the data is (via script) saved off to an archive volume that's excluded from the regular backups.
The archive volume is backed up to tape yearly, and the tapes are sent off to Cintas for indefinite storage.
This allows us to have easy, online access to that unchanging data (so we don't have to call in a tape anytime an accountant wants to look at something), while maintaining indefinite off-site archives of data we may need to keep forever, and without crushing our backup system. Sounds like the same type of setup could work for you, though you might want to adjust the amount of data you keep online, depending on your needs to access this data in a timely fashion - 20TB of enterprise-grade storage is a lot more expensive than archiving it to two or three sets of LTO5 tapes that you store in off-site vaults.
- 53,385
- 32
- 133
- 208
Maybe you can build your own Backblaze Pod: 135Tb for 7384$
Click here for more information: Backblaze Pod building info
You can buy the needed pieces and build it by yourself.
Maybe you can build 3 of them, and keep 2 onsite, and 1 offsite. Then you can use one pod as the "online data", the second onsite pod as a backup of the first pod, and the third offsite pod as an emergency offsite backup.
With 135Tb of storage for each pod you can even think about keeping some history of the change...
135Tb / 20Tb = 19 full backup copy.
Alternatively you can keep 10 full backup plus a ridiculous amount of differential backup.
Naturally, if you want an offsite backup, you'll need some kind of big bandwidth... :-)
- 153
- 2
- 7
-
5If your data and your job are important to you, you should not try to build your own backblaze pod from scratch. It seems like a good idea, until you realize that you are putting all of your eggs in one really big basket. Worse yet, that basket has not been tested as an integrated whole thoroughly. The backblaze secret sauce is the software replication across many pods, which allows for entire pods to fail seamlessly. I would instead recommend a supermicro storage server, centos, xfs and rdiff-backup. – bugaboo Dec 20 '12 at 18:10
My coworker purchased a Synology 8-disk NAS. It runs a hybrid RAID. He purchased eight 3TB Seagate Barracuda from NewEgg a few weeks ago for $89 each. You could rsync mirror from the production NAS to this new NAS over GigaBit. Since you are only transferring the differences, the transfer will take a shorter time. Then you can use the backup NAS to perform full or incrementals. Cost to you would be under $2000 out the door for a backup NAS.
- 295
- 1
- 4
- 12