Utility to optimally distribute files onto multiple DVDs?

11

7

I have a bunch of media files which I want to record to DVD, but since each DVD only fits 4.5GB, I have to find the optimal way to organize the files to use the minimum number of DVDs (otherwise the empty space left in each DVD can easily add up). Are there any tools to help with this?

Many years ago there was a DOS utility to do this with floppy disks.

Alex R

Posted 2009-12-19T20:02:24.703

Reputation: 1 629

Just felt that this was a good page for anyone searching: http://www.howtogeek.com/76264/how-to-burn-data-across-multiple-dvd-or-cd-discs/

– Nav – 2013-08-28T04:48:41.963

1No, I'm not looking for compression & splitting. I want to distribute the files natively (filesystem) so that each disk may be used directly. – Alex R – 2009-12-19T21:01:30.620

Answers

3

Try the free DVD Span :

DVD Span is a backup tool for writing the contents of large folders to multiple DVDs. DVD Span can automatically determine the best organization of each disk in order to fit the maximum amount of data on the minimum number of disks. DVDSpan is a great tool for backing up your music collection, photos, or even your entire hard disk to DVDs. And because it produces regular DVDs (or CDs), no special software is required to read or restore your backups.

harrymc

Posted 2009-12-19T20:02:24.703

Reputation: 306 093

2

Overview

Jeff Shattock's answer is correct that this is equivalent (or isomorphic, as mathematicians write) to a combinatorial optimization problem, but it's equivalent to the 1-dimensional bin packing problem, not the knapsack problem.

Lucky for you, I have some code to share that will solve this problem for you, or anyone else, with access to a Windows computer with at least version 3.5 of the .NET Framework installed.

A Rough Solution

  1. First, download and install LINQPad.

  2. Second, download the LINQPad query I just wrote – here's the linq (ha) to the raw file. Save it as a .linq file and open it in LINQPad.

  3. Change the parameters:

    Here's the part in the LINQPad query code you should change:

    int binSizeMb = 4476; // This is the (floor of the) total size of a DVD+R reported by CDBurnerXP. string rootFileFolderPath = @"F:\2006 - Polyester Pimpstrap Intergalactic Extravaganza multicam";

    Change binSizeMb to the size of your 'bin', e.g. CD, DVD, ex. int binSizeMb = 650; for a CD.

    Note – the binSizeMb value is interpreted as what's sometimes referred to as a mebibyte. Contrary to my childhood, when all byte multiples were 'binary', sometimes 'MB' now refers to a 'decimal megabyte' or exactly 1,000,000 bytes, as opposed to the 1,048,576 bytes of a mebibyte (MiB), which is used in my code. If you wish to change this, change the line const int bytesPerMb = 1048576; in the code to const int bytesPerMb = 1000000;.

    Change rootFileFolderPath to the full path of the folder containing the files you wish to 'pack into bins', ex. string rootFileFolderPath = @"C:\MySecretBinFilesFolder";.

  4. Run the query by either hitting F5 or clicking the Execute button at the top left of the query tab.

Results

The query code will enumerate all of the files in the rootFileFolderPath folder, recursively, meaning it will include files in all of the subfolders too.

Then it will create 'bins' for the files such that the total size of all the files in each bin is less than or equal to the bin size specified.

In the LINQPad results pane you'll see two lists.

The first list is of all the files it found, listed in decreasing order by size.

The second list is the bins created by 'packing the files', with a list of the files and their sizes, as well as the remaining size of the bin.

Here's a screenshot showing the second list and the first two bins created:

LINQPad screenshot showing list of bins

Cursory Analysis

According to Wikipedia, the algorithm I used – the First Fit Decreasing (FFD) strategy – shouldn't be too bad; Wikipedia states:

In 2007, it was proven that the bound 11/9 OPT + 6/9 for FFD is tight.

'OPT' refers to the optimal strategy (as something potentially unreachable, not any particular actual strategy).

Based on my somewhat fuzzy memories of the mathematical terms involved, this should mean that the FFD strategy should, at worst, pack items into ~1.22 times the number of bins that an optimal strategy would. So, this strategy might pack items into 5 bins instead of 4. I suspect that it's performance is likely to be very close to optimal except for specific 'pathological' item sizes.

The same Wikipedia article also states that there is an "exact algorithm". I may decide to implement that too. I'll have to read the paper that describes the algorithm first.

Kenny Evitt

Posted 2009-12-19T20:02:24.703

Reputation: 273

2

Ah, the Knapsack problem. I could only find one online solver for this, here. Your knapsack size would be 4.5GB, and each packet would be your file sizes. You'll need to massage its output a little bit to fit your particular application, but it should be workable. This wont run very fast though, because this problem is hard.

Jeff Shattock

Posted 2009-12-19T20:02:24.703

Reputation: 1 798

1

This isn't equivalent to the knapsack problem but the (1-D) bin packing problem, of which there is an exact algorithm.

– Kenny Evitt – 2014-08-09T03:30:36.750

Yes indeed it's an NP-complete problem, but for this practical application, a brute-force solution is fast enough :) – Alex R – 2009-12-20T04:45:37.733

0

Also, try Discfit, which selects files and directories to copy across various disks:

https://sourceforge.net/projects/discfit/

Anton

Posted 2009-12-19T20:02:24.703

Reputation: 21

An answer with link only is not a good answer. When recommending software, please follow this outline. You should expand (edit) your answer to make it better. E.g. your answer doesn't comply with "give a brief overview of HOW to use the product…" requirement.

– Kamil Maciorowski – 2016-08-13T09:28:39.830

From the website: "Arranges a big set of files or directories in order to use the minimum number of phisycal media (CD, DVD, BD...) pieces. You can drag the resulting sets directly over your burning software (Nero, DVD-go...)". – Anton – 2016-08-16T03:55:08.630

Almost. To make the answer better you should edit the answer. – Kamil Maciorowski – 2016-08-16T04:08:56.860

I can't see anything more to add other than what the authors wrote. One can probably just go to the website and inquire. – Anton – 2016-08-16T07:32:53.637

That's OK. My point is you should edit your answer, not write comment with "expansion" to it. It's the answer that should follow the said outline, not answer+comments. The fragment cited in comment should be cited in your answer. That's all.

– Kamil Maciorowski – 2016-08-16T07:38:46.450

0

You can take one of the variants of the program in Hitchhiker's guide to Haskell, perhaps after working through some part of that tutorial; the tutorial is written around solving exactly yours problem of distributing things onto several disks whereby the solution is incrementally refined, as exemplified by the following passage from Chapter 3 of the tutorial:

Enough preliminaries already. let's go pack some CDs.

As you might already have recognized, our problem is a classical one. It is called a "knapsack problem" (google it up, if you don't know already what it is. There are more than 100000 links).

let's start from the greedy solution...

More ideas: a related question

Here is a similar question (although not the same: it's not being asked for optimization there), where you may find more useful solutions/programs for your task (if they will be posted):

Some hints for understanding the programming in the suggested tutorial

In general, the Haskell code is quite expressive (since Haskell is a language for programming on a high level of abstraction), and hence can be easily grasped.

When looking at the code of one of the solutions, remember that the top-level structure of the program we want to write is quite simple, as put in Chapter 1 of the tutorial:

Now let's think for a moment about how our program will operate and express it in pseudocode:

main = Read list of directories and their sizes.
       Decide how to fit them on CD-Rs.
       Print solution.

Sounds reasonable? I thought so.

Let's simplify our life a little and assume for now that we will compute directory sizes somewhere outside our program (for example, with "du -sb *") and read this information from stdin.

and look further more closely at the parts of the solution.

imz -- Ivan Zakharyaschev

Posted 2009-12-19T20:02:24.703

Reputation: 443

0

Ages ago I wrote PHP script to do such task: https://bitbucket.org/borszczuk/php-backup-maker/

Marcin Orlowski

Posted 2009-12-19T20:02:24.703

Reputation: 105

0

You could use any compression tool that allows splitting of an archive i think

Journeyman Geek

Posted 2009-12-19T20:02:24.703

Reputation: 119 122

1Compression is not what I'm looking for. That makes it too cumbersome to access the files. – Alex R – 2009-12-19T21:02:14.783