-1

Background

We are moving from managing hosts by hand to configuration management.

20 files

I want to compare 20 times a config file from 20 hosts. For example /etc/crontab

Use case

I guess about 15 of 20 files are identical. I want to see the five files which where modified with "vi" by hand.

I want an overview, no automated action like patching ...

How to compare them ...?

I tried my favorit diff tool (meld), but it does not allow more than three files :-(

guettli
  • 3,113
  • 14
  • 59
  • 110
  • Do you want to compare them to a master file or each file to each other? In the first case, just do the diff 20 times. In the latter case: Why? – Sven May 24 '16 at 13:54
  • @Sven yes, I could define one of the as "the master" and run diff 20 times. But I want to get an overview of the current state of the "flea circus". That's why I would like to have one output with lot of columns. – guettli May 24 '16 at 15:03
  • Are you aware that this means 190 comparisons? – Sven May 24 '16 at 15:11
  • @Sven yes, this means 190 comparisons. Why not? Modern CPUs do this in milliseconds. The number of comparisons could be reduced. In my case many files are identical. Algorithm: Compute checksum, see if there are duplicates. The content with the most duplicates is the master. Compare all other contents to the content of the master. – guettli May 25 '16 at 06:54
  • This is not a problem of computing power but of usefulness of the result. I wouldn't know what to do with that output... Anyway, from your edit, it seems your solution is simple: Checksum all files, find the most-used checksum and then diff the remaining files pairwise against one of the "master files", there is no need here for more than a 2-file diff. – Sven May 25 '16 at 13:40
  • @Sven I need 20 "columns" beneath each other. With "column" being the result of the diffs, yes this could be done somehow... but a GUI like meld would be easier to use. – guettli May 25 '16 at 13:54

3 Answers3

0

I try to wrap my head around what diffing 20 files to each other would accomplish, but maybe I have another approach.

I assume that you want to know what kind of cron jobs are defined over all of your systems. Instead of diffing the files, I propose to output them together, sort the output and then use uniq to omit duplicate lines:

File1:

10 10 * * * /myjob.sh
* * * * * /everyminute.sh

File2:

20 20 * * * /evening-job.sh
* * * * * /everyminute.sh

All jobs over all files:

cat File1 File2  | sort | uniq -c

  1     10 10 * * * /myjob.sh
  1     20 20 * * * /evening-job.sh
  2     * * * * * /everyminute.sh

The first column shows the numer of times this job was defined.

Sven
  • 97,248
  • 13
  • 177
  • 225
  • I update the question and added "use case". Maybe you get you can understand my intention now. If not, please ask :-) – guettli May 25 '16 at 06:56
0

Sven's approach is probably better for wildly different files. But if there are really only 5 that are unique, I'd rather do something else... looking at whole files instaed of some coded thing with counts.

Checksum them and show the names and counts. Then you might have eg. 5 unique versions of the files, then you can more easily diff each with the first, one at a time, 4 times.

sha1sum files/* | sort | uniq -c
vimdiff files/file1 files/file2
# FYI short syntax of above (bash):
vimdiff files/file{1,2}

If there are too many results, maybe remove irrelevant things like comments and whitespace first.

mkdir /tmp/trimmed
cd /tmp/trimmed
for f in /path/to/files/*; do
    n=$(basename "$f")
    grep -Ev "^[ \t]*$|^[ \t]*#" "$f" > "$n".trimmed
done
sha1sum *.trimmed | sort | uniq -c
Peter
  • 2,546
  • 1
  • 18
  • 25
0

Answering my own question:

Get configs:

parallel -q -j0 scp {}:/etc/crontab tmp/crontab-{}.conf ::: 

Show overview:

md5sum tmp/crontab-* | sort
581cc593069e5a59cde886d7491e9cf4  tmp/crontab-foo@foo.conf
581cc593069e5a59cde886d7491e9cf4  tmp/crontab-blu@blu.conf
581cc593069e5a59cde886d7491e9cf4  tmp/crontab-x23@x23.conf
581cc593069e5a59cde886d7491e9cf4  tmp/crontab-fmr@fmr.conf
581cc593069e5a59cde886d7491e9cf4  tmp/crontab-bmw@bmw.conf
581cc593069e5a59cde886d7491e9cf4  tmp/crontab-sun@sun.conf
581cc593069e5a59cde886d7491e9cf4  tmp/crontab-nvc@nvc.conf
581cc593069e5a59cde886d7491e9cf4  tmp/crontab-the@the.conf
581cc593069e5a59cde886d7491e9cf4  tmp/crontab-vum@vum.conf
94d1fdc2cc561aafec162209d2360f78  tmp/crontab-bar@bar.conf

Config on host "bar" is different. Check it:

vimdiff tmp/crontab-nvc@nvc.conf tmp/crontab-bar@bar.conf
guettli
  • 3,113
  • 14
  • 59
  • 110