Version Control for MP3s?

7

3

I've got a lot of "binary media", which I'll abstract away as "MP3s". I've also got several computers that I'd like to have the whole library on - a desktop, media box, a laptop here or there, etc. In short, it would be nice to be able to sync all these machines with each other such that they all have the same stack of files.

A Version Control system, as opposed to an rsync/robocopy lashup, in the rough sense seems like the way to go. First, there are several OSs involved (Windows, Mac, Linux flavors). Second, it would be nice if when ID3 tags and such are updated, the system could just update the file delta, not re-copy the whole file. (Finally, being able to update the library over the internet, rather than the lan, would be very cool.)

But your classic CVS/SVN system has the obvious drawback of needing a full repository to work, and I'd really rather not have two copies of my 60gb+ MP3 folder sitting on a machine somewhere, as well as not traditionally dealing with binary deltas very well.

So, Distributed Version Control starts sounding pretty good at this point. Mercurial, git, and bazaar all look good on paper, but I don't have any experience with any of them. Has anyone tried to set up a "binaries-only" DVCS with any of them? Any recommendations? Pitfalls?

Electrons_Ahoy

Posted 2009-08-11T21:15:18.383

Reputation: 2 491

Didn't you hear? rsync exists specifically to avoid copying the entire files when you do that. – SamB – 2010-04-19T16:40:41.150

@SamB: Well, sure. And if the machines involved weren't primarily windows machines without either rsync or SSH installed, I'd have just done that first. ;) The DVCS idea was an attempt to solve the cross-platform issue without having to get a full unix subsystem running on the windows boxen just to copy some mp3s, you know? – Electrons_Ahoy – 2010-04-26T23:04:55.537

@Electrons_Ahoy: ah. yeah, there does seem to be a lack of a nice, easy-to-install, easy-to-use rsync client that doesn't require installing all of Cygwin... – SamB – 2010-04-28T00:15:17.650

4Have you checked typical deltas for media file updates? My guesstimate is that they'll be almost as large as the original file. – None – 2009-08-11T21:40:28.940

@nagul: exactly! I was hoping someone knew of a DVCS that did binary deltas that weren't that big. – Electrons_Ahoy – 2009-08-11T22:23:33.910

Uh Oh. Battle of the versioning systems... – bgw – 2009-08-12T04:07:29.153

1@Electrons_Ahoy: I think both SVN and Git do binary deltas. Problem is that if you do anything with the sound data your MP3s will be recompressed. That likely changes every single bit. Delta compression will not help anything here. If you rarely modify the sound data and usually just edit ID3 tags things are different. – Ludwig Weinzierl – 2009-08-12T12:59:18.907

git-annex makes git work more like it sounds like you want – derobert – 2012-11-26T21:08:22.183

Answers

3

But your classic CVS/SVN system has the obvious drawback of needing a full repository to work, and I'd really rather not have two copies of my 60gb+ MP3 folder sitting on a machine somewhere, as well as not traditionally dealing with binary deltas very well.

With CVS/SVN you have one repository, and several working copies. So the repository contains every file once plus the whole history for every file. The working copy contains every file once plus some additional data per file (usually approx. the size of the file).

Very roughly: Let's assume our revision control system cannot store diffs of binary files efficiently (not really true, but for simplicity). Your collection is 60 GB MP3 files. If you have 10 revisions per file on average and we neglect compression (because MP3s compress bad) your repo will be ca. 600 GB and your working copy ca. 120 GB.

So, Distributed Version Control starts sounding pretty good at this point.

In a distributed system every working copy is essentially a repository, that means every working copy contains every file plus history.

Same assumptions as above, every copy will have ca. 600 GB.

Bottom line is, distributed system will require more space than centralized.

EDIT:

Even if your question is more about a large number of binary file than large binary files in version control the following post might be intersting: Revisiting large binary files issue.

Ludwig Weinzierl

Posted 2009-08-11T21:15:18.383

Reputation: 7 695

1Actually, I believe most DVCSes compress deltas well enough that merely changing the ID3 tags will probably not cause that much trouble... – SamB – 2010-04-19T16:37:33.543

Surprisingly it is exactly backwards. SVN is very space-inefficient -- a working copy without history is 2x the size of the files under its control. Git, Mercurial, and Bzr all often have smaller repository sizes than SVN checkouts AND include full history.

Info on GIT sizes: http://git.or.cz/gitwiki/GitSvnComparsion#SmallSpaceRequirements

– ehempel – 2009-08-12T01:11:33.717

3@echempel: You are right if we are talking about typical uses cases for SVN and Git, that is source code with with little change between revisions. MP3s are different: 1. can't be compressed 2. slight modification (e.g. normalize) will change every single bit – Ludwig Weinzierl – 2009-08-12T12:50:53.067

1

Good points ... I've never done much with binary files in VCSes. Someone should make a comprehensive VCS shootout like http://shootout.alioth.debian.org

– ehempel – 2009-08-12T14:07:05.573

5

This isn't really an answer to your question, but I've started using DropBox for the same purpose. It's cross-platform, and you can get a 100GB account if you don't mind paying a little more. It also stores revisions to files, very similar to source control.

The How-To Geek

Posted 2009-08-11T21:15:18.383

Reputation: 5 482

He asked "the system could just update the file delta, not re-copy the whole file" DropBox would copy the whole file and would use a lot of bandwidth for nothing because he doesn't needed to be external of his lan... – Patrick Desjardins – 2009-08-12T13:09:14.667

1

DropBox does a binary diff, and doesn't copy the whole file. https://www.getdropbox.com/help/8

– The How-To Geek – 2009-08-12T13:27:17.733

2

The problem with trying to shoehorn version control systems into file synchronization systems is that you'll end up wasting a ton of disk space keeping all the old version history data in the repositories.

Personally for my large binary media collections, I don't care about being able to revert changes to any given file. All I care about is that the collection is synchronized between my systems. There are many file synchronization solutions out there, but they all have their various pros and cons. Some claim they're cross platform, but that only means Win/Mac. Others really are cross platform, but don't have large enough file size/quantity limits to be useful for large collections. Some offer web access to the files, but also suffer from the file size/quantity limitations. Any solution that keeps a copy of your files on a 3rd party server is inevitably going to cost you money if you have a large collection of files.

Ryan Bolger

Posted 2009-08-11T21:15:18.383

Reputation: 3 351

2

Not really an answer, but I thought I'd share. I've started using SVN for my HD video projects (like events and weddings where the result is a heavily edited video). This is starting to become really awesome for several reasons.

Usually a video project contains a few or perhaps tens or even hundreds of GB of raw AVCHD files (most just a few hundred MB each though since moving from DV tapes ;). These are added and committed once and then never changed as all the work is then made on (very small and often text or xml-based) video editing software project files, some still images (which are sometimes but not very often changed) and various other descriptor files.

Tagging and naming of clips are also stored in the project files and not added to the actual raw video files which makes this ideal. Say a project repository database starts at 10 GB it will usually end at 11 GB and consist of ~100 revisions. The rendered final result in various formats is of course not stored in the repository at all, as it can always be re-generated.

As mp3s in particular store their metadata in the actual mp3 file this will present much more of a challenge but according to this stackoverflow question subversion might handle this decently in the end as id3 tag data is stored at the beginning (or v1 at the end) of the file. However, as v2.x can be any length - I have no idea what happens if you add additional tag data - if the file will grow larger and perhaps mess up the delta comparison, worth testing...

And storage is cheap - only 60 GB? Get a few 1 TB drives for the repository and be done with it ;)

Oskar Duveborn

Posted 2009-08-11T21:15:18.383

Reputation: 2 616

0

Windows Vista & 7 offer Shadow Copy / Previous Versions. It's definately not as feature-rich as a true source-control provider but does give you some of the benefits. As others have said the storage required to house multiple revisions will likely be fairly massive--depending on the size of the files.

The free and popular SCM's are all so-so at the task. SVN for example will work fine, but the repository will quickly grow and the local .svn folder will be quite large as well.

When all is said and done you might want to consider simply copying the whole lot of files to a safe place prior to making any large changes to your collection; when you're actually using MP3s in a normal day-to-day fashion there's not much reason for changing the files and the expense of having a revision system watching rarely-changed large binary files seems hard to justify... but if you're set on it then SVN at least does binary diff's, CVS does full copies (much larger)

STW

Posted 2009-08-11T21:15:18.383

Reputation: 1 676