1

We need to backup a filesystem with lots of hardlinks. Since there are several hardlinks for each "true" file, we would like to skip all the hardlinks when backing up the filesystem to avoid n exact copies of each file.

The backup is done using Tivoli Storage Manager Backup, and we've been unable to get it to treat hardlinks as anything other than separate files to be backed up alongside each other.

In case it's relevant for possible solutions, I'd like to note that it's possible to tell a hardlink from a proper file by the filename:

 foobarbaz-123.ext    # file
 foobarbaz-123-1.ext  # hardlink
 foobarbaz-123-2.ext  # hardlink
 barbazfoo-456.ext    # file
 barbazfoo-456-1.ext  # hardlink
 barbazfoo-456-2.ext  # hardlink
 barbazfoo-456-3.ext  # hardlink

That is, all hardlinks have two hyphens in the filename, where as proper files have just the one.

The server is running Ubuntu Linux, and the files are situated on a gfs volume on our SAN.

  • Let us know what the eventual resolution to this is -- if it is an exclude list or if there really is a magic "please don't explode" flag you can set in tsm... – chris Jul 10 '09 at 13:20

4 Answers4

3

A quick read of some TSM docs suggests "Don't do that!"

With unix, a "file" is just a directory entry that points to an inode. A "hard link" is just when you have more than one directory entries (pointers) pointing to a given inode. For all intents and purposes, these two "files" are exactly 100% identical.

Hard links are a well established and understood mechanism in unix. It is proper and common to encounter them and it is common for backup software to understand exactly what a hardlink is and to back it up exactly as it should -- as another pointer to a specific piece of data, not as a unique and novel piece of data that happens to be exactly the same as the other hard links.

A quick google of tsm and hardlinks indicates that tsm understands hard links and the docs specifically warn:

Problems can occur if you [back up|archive] only one file of a hard-linked pair. For example, files texta and textb contain a hard link to each other. You archive texta, and then edit textb and make changes. If you retrieve texta, the changes you made to textb are lost.

Interestingly, it seems like are two different ways that you can do backups with TSM -- backups and archives and the two ways seem to deal with hard links differently.

backing up and restoring files:

A hard link is established when two files point to the same data file. When you back up a file that contains a hard link to another file, TSM stores both the link information and the data file on the server. If you back up two files that contain a hard link to each other, TSM stores the same data file under both names, along with the link information.

archiving and restoring files:

When you archive a file that contains a hard link to another file, TSM stores both the link information and the data file on the server.

From this it seems that you'll blow your backup server up if it is "Archiving" things and it will do what you want if you're "backing up." Leave it to IBM to make it simple!

chris
  • 11,784
  • 6
  • 41
  • 51
  • Ah, thanks for pointing to the documentation. We heard from our hosting partner that the backup was too huge to handle due to n copies of the linked files being archived, but I will ask them to double check this and point them to the page your linked to. – Lars Haugseth Jul 09 '09 at 16:14
  • "When you archive a file that contains a hard link to another file, TSM stores both the link information and the data file on the server." - I take this to mean that TSM does indeed store copies of the file for every hardlink. With our setup, this means the backup will take *at least* twice as much space as the files on the SAN. Maybe it's time to rethink the whole storage model instead... – Lars Haugseth Jul 09 '09 at 16:22
  • I found another doc on the same site -- and am editing my answer to reflect the other doc... – chris Jul 09 '09 at 16:39
  • Thanks, I'm not completely sure I've groked the difference with these methods regarding hardlinks, but we can try them both with a subset of the files and see how it works. Oh, and the way we're using the hardlinks, we don't have to worry about one of them being modified and overwritten again by a retrieve. – Lars Haugseth Jul 09 '09 at 21:24
  • Another document discusses the difference here http://publib.boulder.ibm.com/infocenter/tivihelp/v1r1/topic/com.ibm.itsmfdt.doc/ans50000102.htm#bora – chris Jul 09 '09 at 23:46
2

First, there is no difference between a "proper file" and a "hardlink", the hardlink is just another name for the same object, while a softlink is actually a file containing a pointer to the real file, which is why a softlink can cross filesystem boundaries and a hardlink cannot.

About the actual problem: Have a look at the Exclude option and the include-exclude-list option in the documentation, you should be able to work something out with them. (like exclude /path/to/your/files/*-*-?.* or something).

Sven
  • 97,248
  • 13
  • 177
  • 225
1

Without knowing anything about Tivoli Storage manager, it wouldn't be possible to get any piece of software to treat hardlinks differently to files, since there is no actual difference between the original file handle, and the other hardlinks. (it may be possibly to script it based on file names)

Cian
  • 5,777
  • 1
  • 27
  • 40
0

Upgrade to TSM 6.1 and activate deduplication. (currently only available with device type FILE, but patience is a virtue)