How can I get diff to show only added and deleted lines? If diff can't do it, what tool can?
-
2You need to better define what you mean by added and deleted. Specifically, can a line change? If so, how do you want a changed line to be handled? If you are doing strictly line oriented checking, a line changing is identical to the old line being removed and the new line being added. For example, how should it handle a line that's split in two? As two 1 line changed? 2 lines changed? 1 line removed and 2 lines added? Unless you can guarantee that lines will never change, just be added and deleted, I think this is doomed to fail without better definitions. – Christopher Cashell Apr 13 '12 at 16:11
-
1I find the question quite unclear. But at least one interpretation of the question could be answered with `diff A B | grep '^[<>]'` – kasperd Sep 10 '14 at 14:21
-
You may be looking for `comm`. – Jenny D Sep 11 '14 at 06:49
-
@ChristopherCashell, He means ignore sort order; a typically common problem. Usually this is done by first sorting the segments (lines) on each side before doing a typical diff. – Pacerier Mar 10 '16 at 04:31
-
@Pacerier, Are you sure about that? Or are you guessing? Nothing about sorting or search order is mentioned or hinted at in the question. As it stands, the question isn't clear and could be interpreted many different ways. Without knowing *for sure* what he is asking, we're making assumptions and offering solutions that may or may not solve the actual problem. Additionally, the original poster's comment on one of the answers suggests this is *not* related to sorting. It does have to do with the meaning of "added and deleted" vs. "changed". – Christopher Cashell Mar 10 '16 at 19:49
10 Answers
Try comm
Another way to look at it:
Show lines that only exist in file a: (i.e. what was deleted from a)
comm -23 a b
Show lines that only exist in file b: (i.e. what was added to b)
comm -13 a b
Show lines that only exist in one file or the other: (but not both)
comm -3 a b | sed 's/^\t//'
(Warning: If file a
has lines that start with TAB, it (the first TAB) will be removed from the output.)
Sorted files only
NOTE: Both files need to be sorted for comm
to work properly. If they aren't already sorted, you should sort them:
sort <a >a.sorted
sort <b >b.sorted
comm -12 a.sorted b.sorted
If the files are extremely long, this may be quite a burden as it requires an extra copy and therefore twice as much disk space.
-
5just wanted to add that both files needs to be sorted (case sensitive) for this solution to produce correct results – marmor Apr 28 '14 at 10:29
-
2On modern enough shells, you can sort in-line with something like `comm -12 <(sort a) <(sort b)` – Joshua Huber Feb 23 '17 at 21:53
-
1Wow, a new linux command, thanks. That doesn't happen much anymore. – Matt Alexander Jul 29 '20 at 23:34
-
comm first appeared in Version 4 AT&T UNIX which was released in 1973. Other notable features of that version of Unix was that it was written in C instead of assembly language, which made it portable and greatly expanded the potential for Unix to spread. – TomOnTime Jul 31 '20 at 17:23
To show additions and deletions without context, line numbers, +, -, <, > ! etc, you can use diff like this:
diff --changed-group-format='%<%>' --unchanged-group-format='' a.txt b.txt
For example, given two files:
a.txt
Common
Common
A-ONLY
Common
b.txt
Common
B-ONLY
Common
Common
The following command will show lines either removed from a or added to b:
diff --changed-group-format='%<%>' --unchanged-group-format='' a.txt b.txt
output:
B-ONLY
A-ONLY
This slightly different command will show lines removed from a.txt:
diff --changed-group-format='%<' --unchanged-group-format='' a.txt b.txt
output:
A-ONLY
Finally, this command will show lines added to a.txt
diff --changed-group-format='%>' --unchanged-group-format='' a.txt b.txt
output
B-ONLY
- 321
- 2
- 2
comm
might do what you want. From its man page:
DESCRIPTION
Compare sorted files FILE1 and FILE2 line by line.
With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files.
These columns are suppressable with -1
, -2
and -3
respectively.
Example:
[root@dev ~]# cat a
common
shared
unique
[root@dev ~]# cat b
common
individual
shared
[root@dev ~]# comm -3 a b
individual
unique
And if you just want the unique lines and don't care which file they're in:
[root@dev ~]# comm -3 a b | sed 's/^\t//'
individual
unique
As the man page says, the files must be sorted beforehand.
- 1,754
- 12
- 21
- 2,429
- 1
- 20
- 24
Visual comparison tools fit two files together so that a segment with the same number of lines but differing content will be considered a changed segment. Completely new lines between matching segments are considered added segments.
This is also how sdiff command-line tool works, which shows a side-by-side comparison of two files in a terminal. Changed lines are separated by | character. If a line exists only in file A, < is used as the separator character. If a line exists only in file B, > is used as the separator. If you don't have < and > characters in the files, you can use this to show only added lines:
sdiff A B | grep '[<>]'
- 149
- 3
-
`sdiff --suppress-common-lines` may be what people need; it includes `|` as well as `<`/`>` lines but it's exactly what I needed. – dimo414 Apr 29 '20 at 21:36
No, diff
doesn't actually show the differences between two files in the way one might think. It produces a sequence of editing commands for a tool like patch
to use to change one file into another.
The difficulty for any attempt at doing what you're looking for is how to define what constitutes a line that has changed versus a deleted one followed by an added one. Also what to do when lines are added, deleted and changed adjacent to each other.
- 60,515
- 14
- 113
- 148
-
My thoughts exactly. What percentage of characters in a line has to change in order to consider it a new one instead of a modification of the original? Technically even if you have one character in common, you could consider it a "change" instead of a deletion and insertion. – Kamil Kisiel Sep 25 '09 at 18:35
-
1It's been a long time since I've looked at the `diff` sources, but I seem to remember all manner of gyrations to keep track of where two files match to stay in synch and I think there's a threshold for giving up based on how far apart the lines are. But I don't remember any intra-line matching except for (optionally) collapsed white space or ignoring case. Or (perhaps) words to that affect. In any case, it's all about `patch` and "vgrep" just comes along for the ride. Maybe. On Tuesday. – Dennis Williamson Sep 25 '09 at 18:54
-
this is the only useful answer about what diff do, most people use it without even test it. I become crazy with the result of diff, but now I understand what this tool is for. – Badr Elmers Sep 29 '21 at 04:10
Thanks senarvi, your solution (not voted for) actually gave me EXACTLY what I wanted after looking for ages on a ton of pages.
Using your answer, here is what I came up with to get the list of things changed/added/deleted. The example uses 2 versions of the /etc/passwd file and prints out the username for the relevant records.
#!/bin/bash
sdiff passwd1 passwd2 | grep '[|]' | awk -F: '{print "changed: " $1}'
sdiff passwd1 passwd2 | grep '[<]' | awk -F: '{print "deleted: " $1}'
sdiff passwd1 passwd2 | grep '[>]' | awk -F\> '{print $2}' | awk -F: '{print "added: " $1}'
- 21
- 2
-
Note that because the difference between "a line has been modified" and "a line has been removed and *another* line has been added below or above it" is semantic. A generic text based diff tool cannot separate those cases. As a result, your sdiff based answer cannot reliably work for all cases. – Mikko Rantalainen Feb 21 '17 at 13:44
That's what diff does by default... Maybe you need to add some flags to ignore whitespace?
diff -b -B
should ignore blank lines and different numbers of spaces.
- 2,364
- 2
- 14
- 22
-
4No, it shows CHANGED lines as well (lines that have a character or four different). I want lines that only exist in left or right. – C. Ross Sep 25 '09 at 13:35
-
2You could argue that the differing versions of a CHANGED file each exist only in left or right. – markdrayton Sep 25 '09 at 14:28
-
2There's no way for diff (or any other tool) to reliably tell what's a change, and what's a deleted line being replaced by a new line. – Cian Sep 25 '09 at 15:07
-
1Technically, diff treats a "changed" line as if the original line was deleted and a new line was added...so technically it is showing you only added and deleted lines. – KFro Sep 25 '09 at 17:33
I find this particular form often useful:
diff --changed-group-format='-%<+%>' --unchanged-group-format='' f g
Example:
printf 'a\nb\nc\nd\ne\nf\ng\n' > f
printf 'a\nB\nC\nd\nE\nF\ng\n' > g
diff --old-line-format=$'-%l\n' \
--new-line-format=$'+%l\n' \
--unchanged-line-format='' \
f g
Output:
-b
-c
+B
+C
-e
-f
+E
+F
So it shows old lines with -
followed immediately by the corresponding new line with +
.
If we had a deletion of C
:
printf 'a\nb\nd\ne\nf\ng\n' > f
printf 'a\nB\nC\nd\nE\nF\ng\n' > g
diff --old-line-format=$'-%l\n' \
--new-line-format=$'+%l\n' \
--unchanged-line-format='' \
f g
it looks like this:
-b
+B
+C
-e
-f
+E
+F
The format is documented at man diff
:
--line-format=LFMT
format all input lines with LFMT`
and:
LTYPE is 'old', 'new', or 'unchanged'.
GTYPE is LTYPE or 'changed'.
and:
LFMT (only) may contain:
%L contents of line
%l contents of line, excluding any trailing newline
[...]
Related question: https://stackoverflow.com/questions/15384818/how-to-get-the-difference-only-additions-between-two-files-in-linux
Tested in Ubuntu 18.04.
- 3,518
- 1
- 28
- 19
We can combine diff and sed to achieve what you want. lets take the same example from https://serverfault.com/a/68717/947477
[root@dev ~]# cat file1
common
shared
unique
[root@dev ~]# cat file2
common
individual
shared
To show added lines with +
and deleted lines with -
we can use
root@dev ~]# diff -u file1 file2 |sed -n '/^\(+\|-\)/p'
--- a 2022-03-25 18:30:57.507551352 +0530
+++ b 2022-03-25 18:31:15.087860053 +0530
-shared
-unique
+individual
Here, -u
is for printing unified content and sed
will filter only outputs with -
or +
at the beginning.
A more straightforward answer is
diff file1 file2
< shared
< unique
---
> individual
- 101
- 2
File1:
text670_1
text067_1
text067_2
File2:
text04_1
text04_2
text05_1
text05_2
text067_1
text067_2
text1000_1
Use:
diff -y file1 file2
This show two columns for repectives files.
Output:
text670_1
> text04_1
> text04_2
> text05_1
> text05_2
text067_1 text67_1
text067_2 text67_2
> text1000_1
- 97
- 2