9

I am IT everything man at a small company. I want to design a new infrastructure including a new server and a separate backup server with company wide backup policy.

The most important thing in the company is the SQL Server and its databases. There are 10 database, but only 2 of them are really important. The first one 8GB, mostly text data and numbers. The second one about 300GB with 16GB/month grow containing PDFs and GIFs.

To save the storage current backup policy consists of one full backup per week and 6 differentials. I think its about 350GB per week, 1.4TB per month.

After reading so articles about silent data corruption I decided to try ZFS with Nexenta Community edition.

My question: is ZFS with deduplication good for storing backup files in term of reliability or should i think about some tape backup or something else?

EDIT: I know that right now we cannot predict performance, deduplication ratio etc, but I want to know if it is a good idea at all.

Krystian Lieber
  • 190
  • 1
  • 4
  • Deduplication is GREAT for disk based backups.. you can basically do incremental forever if you're paying attention and adding disks as the years go on. – pauska Jun 29 '12 at 10:56
  • are you storing large blob's such as pdf's and gif's in your database? not the best way to store them, we use file links within the database, which keeps the db small, and we let the filesystem (xfs) look after the files. easier and quicker to backup and restore. – The Unix Janitor Jun 29 '12 at 12:44

3 Answers3

10

Certainly ZFS is plenty stable enough to do this kind of thing, there are many very large high-profile and reliable production platforms out there based entirely on ZFS and Nexenta.

That said always like to have on-site disk-based backups such as the one you're suggesting AND removable-disk or tape based backups that go off-site daily to protect against fire/earthquake/Cthulhu etc.

So my answer is yes, it's fine but I'd go for both options if you can.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
10

(assuming you're referring to using dedupe within ZFS versus your backup software)

I would not recommend using ZFS native deduplication for your backup system unless you design your storage system specifically for it.

Using dedupe in ZFS is extremely RAM intensive. Since the deduplication occurs in real-time as data is streamed/written to the storage pool, there's a table maintained in memory that keeps track of data blocks. This is the DDT table. If your ZFS storage server does not have enough RAM to accommodate this table, performance will suffer tremendously. Nexenta will warn you as the table grows past a certain threshold, but by then, it's too late. This can be augmented by the use of an L2ARC device (read cache), but many early adopters of ZFS fell into this trap.

See:

ZFS - destroying deduplicated zvol or data set stalls the server. How to recover?

ZFS - Impact of L2ARC cache device failure (Nexenta)

When I say that the RAM requirement is high for using dedupe, I'd estimate the RAM and L2ARC needs for the data set you're describing at 64GB+ RAM and 200GB+ L2ARC. That's not a minor investment. Keeping lots of Windows system files and image documents that won't be reread will fill that DDT very quickly. The payoff may not be worth the engineering work that needs to go in upfront.

A better idea is to use compression on the zpool, possibly leveraging the gzip capabilities for the more compressible data types. Deduplication won't be worth it as there's a hit when you need to delete deduplicated data (needs to reference the DDT).

Also, how will you be presenting the storage to your backup software? Which backup software suite will you be using? In Windows environments, I present ZFS as block storage to Backup Exec over iSCSI. I never found the ZFS CIFS features to be robust enough and preferred the advantages of a natively-formatted device.

Also, here's an excellent ZFS resource for design ideas. Things About ZFS That Nobody Told You

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 3
    I was one of those that got bit by the attractiveness of ZFS deduplication. Everything was working great in our test environment. We turned it on in production. Everything was fine and smooth, getting 2+ times deduplication ratio. Beautiful. We started moving users over to the new system. No problems until, one day, we moved a user and performance of the file server tanked. Suddenly the machine was on it's knees. A crash and subsequent reboot took over 90 minutes before the machine came back up as it processed the dedup tables. Terrible. We got rid of dedup. I advise staying away from it. – jlp Jun 29 '12 at 21:41
0

An alternative OS is OpenIndiana which is, just as good and receives more frequent updates some of the time.

Another option is to set up a second ZFS server with a smaller (potentially) storage pool with compression enabled. You can use this second device for static backups. You can thus dispense with read cache and also don't need silly amounts of CPU/RAM to handle it.

We run a setup like this where I work:

  • OpenIndiana main storage server [main] with six 2TB disks in a RaidZ1 pool of three sets of mirrored pairs. This, while cutting into your available storage space, makes for a fast and multiply-redundant storage pool.
  • A secondary storage server [backup] also running OpenIndiana with a similar configuration of disks that serves solely as a backup device.
  • main has a script that is run from a cron job which snapshots /tank/[dataset] regularly over the course of the day
  • Every evening, another cron job is run that pushes the day's snapshots over the network to backup. Once the initial sync of all of your snapshots is done (a once-only procedure), the incremental nature of the snapshots means that changes are pushed to your backup device very quickly.

I have a quick rundown on how to rig up ZFS send/receive here: http://kyrill-poole.co.uk/blog/tech/zfs-send-and-receive/

poolski
  • 124
  • 1
  • 3
  • 10
  • Oh yeah, you can probably rig it up so that you don't have to set up nc/ssh to do the heavy lifting for you. – poolski Jul 23 '12 at 22:52