How can I get diff to show only added and deleted lines? If diff can't do it, what tool can?

Question

You need to better define what you mean by added and deleted. Specifically, can a line change? If so, how do you want a changed line to be handled? If you are doing strictly line oriented checking, a line changing is identical to the old line being removed and the new line being added. For example, how should it handle a line that's split in two? As two 1 line changed? 2 lines changed? 1 line removed and 2 lines added? Unless you can guarantee that lines will never change, just be added and deleted, I think this is doomed to fail without better definitions. — Christopher Cashell, Apr 13 '12 at 16:11
I find the question quite unclear. But at least one interpretation of the question could be answered with `diff A B | grep '^[<>]'` — kasperd, Sep 10 '14 at 14:21
@ChristopherCashell, He means ignore sort order; a typically common problem. Usually this is done by first sorting the segments (lines) on each side before doing a typical diff. — Pacerier, Mar 10 '16 at 04:31
@Pacerier, Are you sure about that? Or are you guessing? Nothing about sorting or search order is mentioned or hinted at in the question. As it stands, the question isn't clear and could be interpreted many different ways. Without knowing *for sure* what he is asking, we're making assumptions and offering solutions that may or may not solve the actual problem. Additionally, the original poster's comment on one of the answers suggests this is *not* related to sorting. It does have to do with the meaning of "added and deleted" vs. "changed". — Christopher Cashell, Mar 10 '16 at 19:49

score 115 · Answer 1 · edited Jun 11 '20 at 10:02

115

Try comm

Another way to look at it:

Show lines that only exist in file a: (i.e. what was deleted from a)
```
  comm -23 a b
```
Show lines that only exist in file b: (i.e. what was added to b)
```
  comm -13 a b
```
Show lines that only exist in one file or the other: (but not both)
```
  comm -3 a b | sed 's/^\t//'
```

(Warning: If file a has lines that start with TAB, it (the first TAB) will be removed from the output.)

Sorted files only

NOTE: Both files need to be sorted for comm to work properly. If they aren't already sorted, you should sort them:

sort <a >a.sorted
sort <b >b.sorted
comm -12 a.sorted b.sorted

If the files are extremely long, this may be quite a burden as it requires an extra copy and therefore twice as much disk space.

edited Jun 11 '20 at 10:02

Community

1

answered Sep 25 '09 at 18:11

TomOnTime

7,567
6
28
51

5

just wanted to add that both files needs to be sorted (case sensitive) for this solution to produce correct results – marmor Apr 28 '14 at 10:29
2

On modern enough shells, you can sort in-line with something like `comm -12 <(sort a) <(sort b)` – Joshua Huber Feb 23 '17 at 21:53
1

Wow, a new linux command, thanks. That doesn't happen much anymore. – Matt Alexander Jul 29 '20 at 23:34
comm first appeared in Version 4 AT&T UNIX which was released in 1973. Other notable features of that version of Unix was that it was written in C instead of assembly language, which made it portable and greatly expanded the potential for Unix to spread. – TomOnTime Jul 31 '20 at 17:23

score 22 · Answer 2 · answered Jan 05 '16 at 06:41

To show additions and deletions without context, line numbers, +, -, <, > ! etc, you can use diff like this:

diff --changed-group-format='%<%>' --unchanged-group-format='' a.txt b.txt

For example, given two files:

a.txt

Common
Common
A-ONLY
Common

b.txt

Common
B-ONLY
Common
Common

The following command will show lines either removed from a or added to b:

diff --changed-group-format='%<%>' --unchanged-group-format='' a.txt b.txt

output:

B-ONLY
A-ONLY

This slightly different command will show lines removed from a.txt:

diff --changed-group-format='%<' --unchanged-group-format='' a.txt b.txt

output:

A-ONLY

Finally, this command will show lines added to a.txt

diff --changed-group-format='%>' --unchanged-group-format='' a.txt b.txt

output

B-ONLY

score 16 · Answer 3 · edited Sep 14 '19 at 03:27

comm might do what you want. From its man page:

DESCRIPTION

Compare sorted files FILE1 and FILE2 line by line.

With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files.

These columns are suppressable with -1, -2 and -3 respectively.

Example:

[root@dev ~]# cat a
common
shared
unique

[root@dev ~]# cat b
common
individual
shared

[root@dev ~]# comm -3 a b
    individual
unique

And if you just want the unique lines and don't care which file they're in:

[root@dev ~]# comm -3 a b | sed 's/^\t//'
individual
unique

As the man page says, the files must be sorted beforehand.

Seppo Enarvi · Answer 4 · 2013-10-17T14:50:19.223

Visual comparison tools fit two files together so that a segment with the same number of lines but differing content will be considered a changed segment. Completely new lines between matching segments are considered added segments.

This is also how sdiff command-line tool works, which shows a side-by-side comparison of two files in a terminal. Changed lines are separated by | character. If a line exists only in file A, < is used as the separator character. If a line exists only in file B, > is used as the separator. If you don't have < and > characters in the files, you can use this to show only added lines:

sdiff A B | grep '[<>]'

`sdiff --suppress-common-lines` may be what people need; it includes `|` as well as `<`/`>` lines but it's exactly what I needed. — dimo414, Apr 29 '20 at 21:36

score 3 · Answer 5 · answered Sep 25 '09 at 15:59

3

No, diff doesn't actually show the differences between two files in the way one might think. It produces a sequence of editing commands for a tool like patch to use to change one file into another.

The difficulty for any attempt at doing what you're looking for is how to define what constitutes a line that has changed versus a deleted one followed by an added one. Also what to do when lines are added, deleted and changed adjacent to each other.

answered Sep 25 '09 at 15:59

Dennis Williamson

60,515
14
113
148

My thoughts exactly. What percentage of characters in a line has to change in order to consider it a new one instead of a modification of the original? Technically even if you have one character in common, you could consider it a "change" instead of a deletion and insertion. – Kamil Kisiel Sep 25 '09 at 18:35
1

It's been a long time since I've looked at the `diff` sources, but I seem to remember all manner of gyrations to keep track of where two files match to stay in synch and I think there's a threshold for giving up based on how far apart the lines are. But I don't remember any intra-line matching except for (optionally) collapsed white space or ignoring case. Or (perhaps) words to that affect. In any case, it's all about `patch` and "vgrep" just comes along for the ride. Maybe. On Tuesday. – Dennis Williamson Sep 25 '09 at 18:54
this is the only useful answer about what diff do, most people use it without even test it. I become crazy with the result of diff, but now I understand what this tool is for. – Badr Elmers Sep 29 '21 at 04:10

score 2 · Answer 6 · answered Nov 18 '13 at 12:05

Thanks senarvi, your solution (not voted for) actually gave me EXACTLY what I wanted after looking for ages on a ton of pages.

Using your answer, here is what I came up with to get the list of things changed/added/deleted. The example uses 2 versions of the /etc/passwd file and prints out the username for the relevant records.

#!/bin/bash
sdiff passwd1 passwd2 | grep '[|]' | awk -F: '{print "changed: " $1}'
sdiff passwd1 passwd2 | grep '[<]' | awk -F: '{print "deleted: " $1}'
sdiff passwd1 passwd2 | grep '[>]' | awk -F\> '{print $2}' | awk -F: '{print "added: " $1}'

Note that because the difference between "a line has been modified" and "a line has been removed and *another* line has been added below or above it" is semantic. A generic text based diff tool cannot separate those cases. As a result, your sdiff based answer cannot reliably work for all cases. — Mikko Rantalainen, Feb 21 '17 at 13:44

score 2 · Answer 7 · answered Sep 25 '09 at 13:26

2

That's what diff does by default... Maybe you need to add some flags to ignore whitespace?

diff -b -B

should ignore blank lines and different numbers of spaces.

answered Sep 25 '09 at 13:26

Scott Lundberg

2,364
2
14
22

4

No, it shows CHANGED lines as well (lines that have a character or four different). I want lines that only exist in left or right. – C. Ross Sep 25 '09 at 13:35
2

You could argue that the differing versions of a CHANGED file each exist only in left or right. – markdrayton Sep 25 '09 at 14:28
2

There's no way for diff (or any other tool) to reliably tell what's a change, and what's a deleted line being replaced by a new line. – Cian Sep 25 '09 at 15:07
1

Technically, diff treats a "changed" line as if the original line was deleted and a new line was added...so technically it is showing you only added and deleted lines. – KFro Sep 25 '09 at 17:33

score 1 · Answer 8 · answered Jun 17 '19 at 13:48

I find this particular form often useful:

diff --changed-group-format='-%<+%>' --unchanged-group-format='' f g

Example:

printf 'a\nb\nc\nd\ne\nf\ng\n' > f
printf 'a\nB\nC\nd\nE\nF\ng\n' > g
diff --old-line-format=$'-%l\n' \
     --new-line-format=$'+%l\n' \
     --unchanged-line-format='' \
     f g

Output:

-b
-c
+B
+C
-e
-f
+E
+F

So it shows old lines with - followed immediately by the corresponding new line with +.

If we had a deletion of C:

printf 'a\nb\nd\ne\nf\ng\n' > f
printf 'a\nB\nC\nd\nE\nF\ng\n' > g
diff --old-line-format=$'-%l\n' \
     --new-line-format=$'+%l\n' \
     --unchanged-line-format='' \
     f g

it looks like this:

-b
+B
+C
-e
-f
+E
+F

The format is documented at man diff:

       --line-format=LFMT
              format all input lines with LFMT`

and:

       LTYPE is 'old', 'new', or 'unchanged'.
              GTYPE is LTYPE or 'changed'.

and:

              LFMT (only) may contain:

       %L     contents of line

       %l     contents of line, excluding any trailing newline

       [...]

Related question: https://stackoverflow.com/questions/15384818/how-to-get-the-difference-only-additions-between-two-files-in-linux

Tested in Ubuntu 18.04.

score 0 · Answer 9 · answered Mar 25 '22 at 13:15

We can combine diff and sed to achieve what you want. lets take the same example from https://serverfault.com/a/68717/947477

[root@dev ~]# cat file1
common
shared
unique

[root@dev ~]# cat file2
common
individual
shared

To show added lines with + and deleted lines with - we can use

root@dev ~]# diff -u file1 file2 |sed -n '/^\(+\|-\)/p'

--- a   2022-03-25 18:30:57.507551352 +0530
+++ b   2022-03-25 18:31:15.087860053 +0530
-shared
-unique
+individual

Here, -u is for printing unified content and sed will filter only outputs with - or + at the beginning.

A more straightforward answer is

diff file1 file2
< shared
< unique
---
> individual

score -1 · Answer 10 · answered Oct 17 '16 at 18:42

File1:

text670_1
text067_1
text067_2

File2:

text04_1
text04_2
text05_1
text05_2
text067_1
text067_2
text1000_1

Use:

diff -y file1 file2

This show two columns for repectives files.

Output:

text670_1                           
                                  > text04_1
                                  > text04_2
                                  > text05_1
                                  > text05_2
text067_1                           text67_1
text067_2                           text67_2
                                  > text1000_1

How can I get diff to show only added and deleted lines? If diff can't do it, what tool can?

10 Answers10

Try comm

Sorted files only