Notepad++ Compare two files and remove

13

3

Say I have two files. file1.txt and file2.txt

Both files contains a list of shoe brands name (1000+ names), like this:

brand1 brand2 brand3 brand...

Now - I want to compare file1 to file2, delete all the reoccurring entries and only show me Whats in files1 that's not in file2 and vice versa.

The goal in other words is to see what's not in the opposite file since these entries is going to be typed manually into a product backoffice for two different categories so that they'll match/be the same in the end.

Kristian

Posted 2013-03-07T11:21:54.977

Reputation: 131

would a different tool be suitable? You could do this easily in a few lines of python, for example (read each brand from each file and save into a set, then print the set) – Baldrickk – 2020-02-25T15:58:25.127

2IMO this would be far easier to accomplish in Excel if you can copy all your data into it or save the TXTs as CSVs. It can easily sort, remove duplicates and I'm sure column comparison would not be hard to accomplish either. – Karan – 2013-03-07T20:08:10.390

The following link may be useful:

http://superuser.com/a/290445

– akjain – 2013-12-12T10:00:32.090

Answers

8

Would the plugin "Compare" of Notepad++ would do the trick?

You can install it from the menu of Notepad++ plugins=> Plugin Manager=> Compare 1.5.6

Here's the official description: A very useful diff plugin to show the difference between 2 files (side by side). Author: Ty Landercasper, now maintained and updated by Jean-Sebastien Leroy Source: http://sourceforge.net/projects/npp-plugins/files/ComparePlugin/Compare_1_5_5_src.zip/download

Fabien

Posted 2013-03-07T11:21:54.977

Reputation: 327

2Unfortunately, I don't think it does. The Compare plugin merely highlights the differences between two files, but offers no tools to make selections or edits based on its results. While certainly helpful, I'm afraid the task is still very tedious for over one thousand brand names. – Marcks Thomas – 2013-03-07T12:04:50.913

3

An old question, but...

  1. Compare the files in WinMerge
  2. Tools -> Generate Patch (save this)
  3. The patch has changes from both, but also extra markup. In notepad++, do the following replaces:

        Search Mode:  Regular Expression
        Find What:    ^[0-9-].*$
        Replace With: <blank>
        Replace All
    

    .

        Search Mode:  Regular Expression
        Find What:    (<|>)
        Replace With: <blank>
        Replace All
    
  4. Use the TextFX plugin in notepad++ either do a Tools->case-insensitive sort (output UNIQUE option selected), or Edit->Delete blank lines

Bit mungy, but I've yet to find a tool that will do this in one click.

James King

Posted 2013-03-07T11:21:54.977

Reputation: 131

1

To substract two files in notepad++ (file1 - file2) you may follow this procedure:

  1. Add ---------------------------- as a footer on file1 (add at least 10 dashes). This is the marker line that separates file1 content from file2.
  2. Then copy the contents of file2 to the end of file1 (after the marker)
  3. Control + H
  4. Search: (?m)^\b(.*)\R(?=[\s\S]+-{10,}$[\s\S]+^\1\R)
  5. Replace by: (leave empty)
  6. Select Regular expression radio button
  7. Replace All
  8. Finally remove footer and file2 content

You can modify the marker if It is possible that file1/file2 can have lines equal to the marker. In that case you will have to adapt the regular expression.

By the way, you could even record a macro to do all steps (add the marker, switch to file2, copy content to file1, apply the regex, and even cleaning the data after the substraction) with a single button press.

Julio

Posted 2013-03-07T11:21:54.977

Reputation: 126

0

If Unix is available to you, you could try these simple combinations of simple commands; tr, sort, and comm.

First, convert the file from horizontally separated to vertically separated:

tr '[:blank:]' '\n' < file1.txt > /tmp/file1.vertical
tr '[:blank:]' '\n' < file2.txt > /tmp/file2.vertical

Then sort the files:

sort /tmp/file1.vertical > /tmp/file1.sorted
sort /tmp/file2.vertical > /tmp/file2.sorted

Now you can see what's in file1 that's not in file2

comm -23 /tmp/file1.sorted /tmp/file2.sorted

Or see what's in file2 that's not in file1

comm -13 /tmp/file1.sorted /tmp/file2.sorted

If you want the output in the same horizontal format you started with, you can do this:

comm -23 /tmp/file1.sorted /tmp/file2.sorted | tr '\n' ' '
comm -13 /tmp/file1.sorted /tmp/file2.sorted | tr '\n' ' '

When you are done, you could delete the temporary files you created:

rm /tmp/file1.vertical /tmp/file2.vertical /tmp/file1.sorted /tmp/file2.sorted

eric

Posted 2013-03-07T11:21:54.977

Reputation: 771