Linux: Compare Directory Structure Without Comparing Files

56

38

What is the best and simplest way to compare two directory structures without actually comparing the data in files? This works fine:

diff -qr dir1 dir2_

But it's really slow because it's comparing files too. Is there a switch for diff or another simple cli tool to do this?

Jonah

Posted 2010-07-22T02:20:37.243

Reputation: 805

By "directory structure", do you mean just the directory paths, or the paths of both directory and non-directory files? – intuited – 2010-07-22T06:26:36.497

Yes, folders and files. – Jonah – 2010-07-22T15:30:43.953

1In that case you should remove the -type d option from @slartibartfast's answer, or check out my answer. – intuited – 2010-07-22T17:52:56.030

Answers

37

The following (if you substitute the first directory for directory1 and the second for directory2) should do what you're looking for and swiftly:

find directory1 -type d -printf "%P\n" | sort > file1
find directory2 -type d -printf "%P\n" | sort | diff - file1

The fundamental principle is that it prints out all of the directories including subdirectory paths relative to the base directoryN directories.

This could fall down (produce wierd output) if you have carriage returns in some of the directory names but not others.

Slartibartfast

Posted 2010-07-22T02:20:37.243

Reputation: 6 899

This is no good for me, because if one directory contains a folder with a few thousand files in they are all listed individually, while diff -rq just shows the root directory exists in one, and carries on. – Chris Jefferson – 2016-09-21T14:46:16.400

As pointed out (years ago) by intuited, to answer the OPs question, the -type d should be removed so that files are considered in the comparison as well as directories – user2746401 – 2018-05-24T15:40:17.523

I understand and respect that reading of the problem statement. That was not my reading at the time. Are you recommending I edit my answer to respond to the updated question? I'm okay doing that if you think it will be helpful to some people, and I'm okay leaving the solution and comment set the way they are now, which seems to be reasonably effective. – Slartibartfast – 2018-05-25T23:12:30.767

34

vimdiff <(cd dir1; find . | sort) <(cd dir2; find . | sort)

will give you a nice side-by-side display of the two directory hierarchies with any common sections folded.

garyjohn

Posted 2010-07-22T02:20:37.243

Reputation: 29 085

This solution fails randomly. When vim reads (or re-reads) the temporary file descriptor, it is already gone. – Denilson Sá Maia – 2016-08-25T17:09:03.630

23

I usually use rsync for this task:

rsync -nav --delete DIR1/ DIR2

BE VERY CAREFUL to always use the -n, aka --dry-run, option, or it will synchronize (change the contents of) the directories.

This will compare files based on file modification times and sizes... I think that's what you really want, or at least you don't mind if it does that? I got the sense that you just want it to happen faster, not that you need it to ignore the difference between file contents. If you do want it to not list differing files with identical names, I think the addition of the --ignore-existing option will do that.

Also be aware that not putting a / at the end of DIR1 will cause it to compare the directory DIR1 with the contents of DIR2.

The output ends up being a bit verbose, but it will show you which files/directories differ. Files/directories present in DIR2 and not in DIR1 will be prefaced with the word deleting.

For some situations, @slartibartfast's answer may be more appropriate, though you'll need to remove the -type d option to enable the listing of non-directory files. rsync will be faster if you've got a significant number of files/directories to compare.

intuited

Posted 2010-07-22T02:20:37.243

Reputation: 2 861

Excellent answer. In rsync's output it's hard to notice the deleting... text but it's probably one of the better ways to compare files while still maintaining speed. Other' answers here are faster when diffing files isn't required...as in OP's example, but I really like this one. – Joel Mellon – 2014-12-18T20:11:03.703

This is what I was after. I had some files with different sizes in a massive pair of directory trees, and I wanted to know which ones. This achieved that aim in mere seconds. – suprjami – 2015-11-30T12:13:23.590

Maybe it is a good idea to run it with a user that has a read only access. Like sudo -u nobody rsync -nav --delete d1 d2 provided that the flags for 'others' allow reading. – user1182474 – 2016-01-22T15:36:09.950

When running this solution I got "building file list...done\n sent X bytes received Y bytes Z bytes/sec total size is A speedup is B" (where I substituted XYZAB for numbers). Does that mean that everything was identical? Since it didn't mention anything more specific? Thanks in advance – Scott H – 2018-01-04T14:17:50.490

To answer my own question, I experimented adding different files to each, and it appears that no specific files/dirs mentioned in the output means they are all the same. – Scott H – 2018-01-04T20:34:44.840

18

Similar to the ls answer but if you install tree then you can

tree dir1 > out1
tree dir2 > out2
diff out1 out2

digit

Posted 2010-07-22T02:20:37.243

Reputation: 181

7Or to avoid the tmpfiles, diff <( tree dir1 ) <( tree dir2 ) – Joel Mellon – 2014-12-18T19:46:39.550

1I recommend running tree with the i flag, which doesn't print the tree lines (tree -i dir1, etc). If the directory structure is different in one place, the other files that do match may have more or fewer | symbols in the tree output, and diff will catch those lines even if the file paths are identical. – askewchan – 2015-12-10T17:31:12.640

2diff <( tree -i dir1 ) <( tree -i dir2 ) is by far the best answer. I'm tempted to downvote all answers that suggest diff or rsync as the question explicitly says NOT to read the file contents. NOTE: The suggestion of using two pipes requires careful use of spaces between brackets, follow the example exactly. E.g. to compare two 20G volumes after a backup the tree answer took about 5 seconds. The others took 20+ minutes. – Jason Morgan – 2017-01-13T12:01:04.547

3

I was just looking for solution for this problem. The solution that I liked the most was:

comm <(ls DIR1) <(ls DIR2)

It gives you 3 columns: 1 - files only in DIR1, 2 - files only in DIR2, 3 - files only in DIR3 For more details look at this blog post.

kyrisu

Posted 2010-07-22T02:20:37.243

Reputation: 1 405

@Michael: comm -3 (see man comm). – Zaz – 2014-07-20T11:31:55.583

Where is DIR3 specified? All I see is DIR1 and DIR2. – Michael Dorst – 2013-08-20T23:59:10.747

I tried it, and (from what I can tell) the output was: all the files only in DIR1 in column 1, all the files only in DIR2 in column 2, and all the files shared by both in column 3. That's sort of useful, but do you know how one might strip out column 3 and leave only the differences? I have a *lot* of files to sort through, and most of it is identical. I don't need to see what's the same. – Michael Dorst – 2013-08-21T00:14:17.050

1Also, I found that comm <(ls DIR1) <(ls DIR2) did not work recursively. For that I used comm <(ls -R1 DIR1) <(ls -R1 DIR2). ls -R crawls through directories recursively, and ls -1 (note that that is a one, not an L) makes ls print only one filename per line. – Michael Dorst – 2013-08-21T00:22:55.907

2

ls > dir1.txt

ls > dir2.txt

Then just diff the two lists.

MDMarra

Posted 2010-07-22T02:20:37.243

Reputation: 19 580

It seems like the OP wants a heirarchy of paths. This will diff all files in the current directory. It's debatable, but possible, that he just wants directories; he might want filenames rather than the contents of files. – intuited – 2010-07-22T06:24:21.853

@intuited - you're right. I misread it. – MDMarra – 2010-07-22T13:16:57.167

2

This is optimum solution

diff --brief -r dir1 dir2

--brief switch reports only whether the files differ, not the details of the difference.

jkshah

Posted 2010-07-22T02:20:37.243

Reputation: 175

1OP doesn't want the file contents comparison. But it's really slow because it's comparing files too. – Joel Mellon – 2014-12-18T20:06:37.940

1The OP already has -q in the question, which is an alias for --brief. This answer doesn't provide any new information. – Michael Dorst – 2013-08-20T23:54:57.343

1

This worked for my specific need to find missing files in trees expected to match.

diff <( cd dir1; find * |sort ) <(cd dir2; find * | sort)

amhest

Posted 2010-07-22T02:20:37.243

Reputation: 11

1

use "diff -qr" to get the different files and then filter out the file comparison with grep in order to only get the filenames that are only in one of the directories.

diff -qr dir1 dir2 | grep -v "Files.*differ" 

Anonymous

Posted 2010-07-22T02:20:37.243

Reputation: 11

-3

I think only rsync is userfull. why?

diff is useful only for structures keeping files and directories. Diff does not give adequate exit codes when we use symlinks. In that situation diff can return 2 exit codes, even if src and dst are identical (times, sizes, names, timestamps, pointing softlinks etc).

dir, the filesystem does not guarantee file ordering, even if directory contents on src and dst are identical. Maybe you should filter the ls output by sorting it. But pure ls displays only node names.

maybe script including diff, cmp, test -X for node types will be usefull, but remember about overload made by many test/cmp runs. The script will be very slow.

As usual, if you want get simple info "dirs is/isn't identical", you should use rsync with the -n (dry) option. If you want to find what is different, use the diff command.

Znik

Posted 2010-07-22T02:20:37.243

Reputation: 259

I would like to know why minuses? – Znik – 2016-03-10T10:05:34.027