Check if a large file includes another (smaller) file

2

1

I have a 3MB and a 5MB text file. I want to make sure that the larger file includes all the lines that are in smaller file.

The output needs to show all the lines of the smaller file not included in the larger file. I try to compare them with Notepad++ but it hangs. Word 2007 compare is difficult to understand.

I tried Beyond Compare and WinMerge and fc and many more. Lines are not in same order in two files - so compare tool is saying line is different, but same line is there in different place in big file. Think small file is like this -

abc def
ghi jkl
mno pqr
yza bcd

Think big file is like this -

efg hij
mno pqr
ghi jkl
abc def
stu vwx

I want output this -

yza bcd

Angelo

Posted 2013-01-25T17:59:13.593

Reputation: 23

Apparently those are text files, Try with different text editors, like scite, and maybe an online utility. – Martín Canaval – 2013-01-25T18:47:15.493

PowerShell can probably do this, but you'll have to do some googling. – Sam Axe – 2013-01-26T00:42:49.023

@Dan-o: Yes, I am thinking so too!!! I add batch powershell vbscript to question but TFM removes it for some reason. But I am asking here after looking on google, where I see only Linux answers (which also I say but TFM remove). – Angelo – 2013-01-26T00:57:26.717

I removed the [powershell] and the [vbscript] tags, because you hadn't mentioned them in your original question, and you should ask for a solution, and not suggest what you specifically want to try. By limiting your scope to certain technologies, you also limit the suggestions. – TFM – 2013-01-26T19:21:10.813

Answers

3

Using a comparison tool like Beyond Compare, KDiff3, or Perforce should be sufficient.

UPDATE:

I was feeling generous this morning so I threw this together for you. Should do what you want.

enter image description here

Some notes:

1.) This code will handle duplicates. For instance, if a line with the same text appears twice in the small file, it is expected to appear twice in the large file.

2.) This code ignores line ordering as per your use case.

3.) Small bug regarding blank lines at the end of a file that I didn't want to mess with. This code treats a blank line as a line just like any other line as long as it isn't at the end of the file, in which case one blank line is allowed (and ignored). An example, if the small file has 3 blank lines at the end of the file and no other blank lines, then the large file is expected to have at least 2 blank lines in the midst of the other lines, or 3 blank lines at the end of the file.

To run:

1.) Make sure you have a JDK installed

2.) Make sure java is in your path. If you are on a Windows system go to Control Panel > System > Advanced System Settings > Environment Variables and select Path under the System Variables section. Append the location of your JDK bin folder to the path variable, making sure to separate it from the previous entry with a semi-colon.. something like this:

C:\Program Files (x86)\Java\jdk1.6.0_38\bin;

3.) Copy the code below into a file named FileLineComparator.java

4.) Open a command prompt and navigate to the directory with the file you just created

5.) Type javac FileLineComparator.java

6.) Type java -cp . FileLineComparator

7.) Enjoy!

import java.io.*;
import java.util.ArrayList;
import javax.swing.JFileChooser;
import javax.swing.JOptionPane;

public class FileLineComparator extends javax.swing.JFrame {

    public FileLineComparator() {
        initComponents();
    }

    @SuppressWarnings( "unchecked" )
    // <editor-fold defaultstate="collapsed" desc="Generated Code">
    private void initComponents() {

        fileChooser = new javax.swing.JFileChooser();
        smallFileTextField = new javax.swing.JTextField();
        smallFileLabel = new javax.swing.JLabel();
        largeFileLabel = new javax.swing.JLabel();
        largeFileTextField = new javax.swing.JTextField();
        outputFileLabel = new javax.swing.JLabel();
        outputFileTextField = new javax.swing.JTextField();
        goButton = new javax.swing.JButton();
        smallFileButton = new javax.swing.JButton();
        largeFileButton = new javax.swing.JButton();
        outputFileButton = new javax.swing.JButton();

        setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);

        smallFileLabel.setText("Small text file:");

        largeFileLabel.setText("Large text file:");

        outputFileLabel.setText("Output file:");

        goButton.setText("Go!");
        goButton.addMouseListener(new java.awt.event.MouseAdapter() {
            public void mouseClicked(java.awt.event.MouseEvent evt) {
                goButtonMouseClicked(evt);
            }
        });

        smallFileButton.setText("Browse");
        smallFileButton.addMouseListener(new java.awt.event.MouseAdapter() {
            public void mouseClicked(java.awt.event.MouseEvent evt) {
                smallFileButtonMouseClicked(evt);
            }
        });

        largeFileButton.setText("Browse");
        largeFileButton.addMouseListener(new java.awt.event.MouseAdapter() {
            public void mouseClicked(java.awt.event.MouseEvent evt) {
                largeFileButtonMouseClicked(evt);
            }
        });

        outputFileButton.setText("Browse");
        outputFileButton.addMouseListener(new java.awt.event.MouseAdapter() {
            public void mouseClicked(java.awt.event.MouseEvent evt) {
                outputFileButtonMouseClicked(evt);
            }
        });

        javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane());
        getContentPane().setLayout(layout);
        layout.setHorizontalGroup(
            layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
            .addGroup(layout.createSequentialGroup()
                .addContainerGap()
                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING, false)
                    .addGroup(layout.createSequentialGroup()
                        .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
                            .addGroup(layout.createSequentialGroup()
                                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
                                    .addComponent(largeFileLabel)
                                    .addComponent(smallFileLabel))
                                .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
                                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.TRAILING, false)
                                    .addComponent(outputFileTextField, javax.swing.GroupLayout.Alignment.LEADING, javax.swing.GroupLayout.DEFAULT_SIZE, 194, Short.MAX_VALUE)
                                    .addComponent(largeFileTextField, javax.swing.GroupLayout.Alignment.LEADING)
                                    .addComponent(smallFileTextField)))
                            .addComponent(outputFileLabel))
                        .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
                        .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
                            .addComponent(largeFileButton)
                            .addComponent(smallFileButton)
                            .addComponent(outputFileButton)))
                    .addComponent(goButton, javax.swing.GroupLayout.DEFAULT_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE))
                .addContainerGap(16, Short.MAX_VALUE))
        );
        layout.setVerticalGroup(
            layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
            .addGroup(layout.createSequentialGroup()
                .addContainerGap()
                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASELINE)
                    .addComponent(smallFileTextField, javax.swing.GroupLayout.PREFERRED_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE, javax.swing.GroupLayout.PREFERRED_SIZE)
                    .addComponent(smallFileLabel)
                    .addComponent(smallFileButton))
                .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASELINE)
                    .addComponent(largeFileLabel)
                    .addComponent(largeFileTextField, javax.swing.GroupLayout.PREFERRED_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE, javax.swing.GroupLayout.PREFERRED_SIZE)
                    .addComponent(largeFileButton))
                .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASELINE)
                    .addComponent(outputFileTextField, javax.swing.GroupLayout.PREFERRED_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE, javax.swing.GroupLayout.PREFERRED_SIZE)
                    .addComponent(outputFileLabel)
                    .addComponent(outputFileButton))
                .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
                .addComponent(goButton, javax.swing.GroupLayout.PREFERRED_SIZE, 62, javax.swing.GroupLayout.PREFERRED_SIZE)
                .addContainerGap(javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE))
        );

        pack();
    }// </editor-fold>

    private void smallFileButtonMouseClicked( java.awt.event.MouseEvent evt ) {
        setSelectedFile( FILE_TYPES.SMALL );
    }

    private void largeFileButtonMouseClicked( java.awt.event.MouseEvent evt ) {
        setSelectedFile( FILE_TYPES.LARGE );
    }

    private void outputFileButtonMouseClicked( java.awt.event.MouseEvent evt ) {
        setSelectedFile( FILE_TYPES.OUTPUT );
    }

    private void goButtonMouseClicked( java.awt.event.MouseEvent evt ) {
        errorStub = new StringBuilder();
        smallFile = new File( smallFileTextField.getText() );
        smallFileTextField.setText( smallFile.getAbsolutePath() );
        largeFile = new File( largeFileTextField.getText() );
        largeFileTextField.setText( largeFile.getAbsolutePath() );
        outputFile = new File( outputFileTextField.getText() );
        outputFileTextField.setText( outputFile.getAbsolutePath() );
        process();
    }

    private void setSelectedFile( FILE_TYPES fileType ) {
        int returnVal = fileChooser.showOpenDialog( null );
        if( returnVal == JFileChooser.APPROVE_OPTION ) {
            File file = fileChooser.getSelectedFile();
            switch( fileType ) {
                case SMALL:
                    smallFileTextField.setText( file.getPath() );
                    break;
                case LARGE:
                    largeFileTextField.setText( file.getPath() );
                    break;
                case OUTPUT:
                    outputFileTextField.setText( file.getPath() );
                    break;
            }
        }
    }

    private void process() {
        ArrayList<String> smallFileLines = readFileLines( smallFile );
        ArrayList<String> largeFileLines = readFileLines( largeFile );
        ArrayList<String> outputFileLines = new ArrayList<String>();

        for( String line : smallFileLines ) {
            if( !largeFileLines.contains( line ) ) {
                outputFileLines.add( line );
            } else {
                largeFileLines.remove( line );
            }
        }

        if( errorStub.length() == 0 ) {
            writeOutput( outputFileLines );
        }

        if( errorStub.length() == 0 ) {
            JOptionPane.showMessageDialog( null, "Finished Successfully!" );
        } else {
            JOptionPane.showMessageDialog( null, errorStub.toString() );
        }
    }

    private ArrayList<String> readFileLines( File file ) {
        ArrayList<String> al = new ArrayList<String>();
        try {
            FileReader fr = new FileReader( file );
            BufferedReader bufRdr = new BufferedReader( fr );
            String line = null;
            while( ( line = bufRdr.readLine() ) != null ) {
                al.add( line );
            }
            bufRdr.close();
        } catch( IOException ioex ) {
            errorStub.append( String.format( "Error reading file %s\r\n", file.getAbsolutePath() ) );
            System.err.println( ioex.getMessage() );
        }
        return al;
    }

    private void writeOutput( ArrayList<String> outputFileLines ) {
        try {
            FileWriter fw = new FileWriter( outputFile );
            BufferedWriter bw = new BufferedWriter( fw );
            for( int i = 0; i < outputFileLines.size(); i++ ) {
                String line = String.format( "%s%s", outputFileLines.get( i ), i + 1 == outputFileLines.size() ? "" : "\r\n" );
                bw.write( line );
            }
            bw.close();
        } catch( Exception ex ) {
            errorStub.append( String.format( "Error writing file %s\r\n", outputFile.getAbsolutePath() ) );
            System.err.println( ex.getMessage() );
        }
    }

    public static void main( String args[] ) {
        try {
            for( javax.swing.UIManager.LookAndFeelInfo info : javax.swing.UIManager.getInstalledLookAndFeels() ) {
                if( "Nimbus".equals( info.getName() ) ) {
                    javax.swing.UIManager.setLookAndFeel( info.getClassName() );
                    break;
                }
            }
        } catch( ClassNotFoundException ex ) {
            java.util.logging.Logger.getLogger( FileLineComparator.class.getName() ).log( java.util.logging.Level.SEVERE, null, ex );
        } catch( InstantiationException ex ) {
            java.util.logging.Logger.getLogger( FileLineComparator.class.getName() ).log( java.util.logging.Level.SEVERE, null, ex );
        } catch( IllegalAccessException ex ) {
            java.util.logging.Logger.getLogger( FileLineComparator.class.getName() ).log( java.util.logging.Level.SEVERE, null, ex );
        } catch( javax.swing.UnsupportedLookAndFeelException ex ) {
            java.util.logging.Logger.getLogger( FileLineComparator.class.getName() ).log( java.util.logging.Level.SEVERE, null, ex );
        }

        java.awt.EventQueue.invokeLater( new Runnable() {

            public void run() {
                new FileLineComparator().setVisible( true );
            }
        } );
    }

    private enum FILE_TYPES {
        SMALL,
        LARGE,
        OUTPUT
    }

    private File smallFile = null;
    private File largeFile = null;
    private File outputFile = null;

    private StringBuilder errorStub = null;

    // Variables declaration - do not modify
    private javax.swing.JFileChooser fileChooser;
    private javax.swing.JButton goButton;
    private javax.swing.JButton largeFileButton;
    private javax.swing.JLabel largeFileLabel;
    private javax.swing.JTextField largeFileTextField;
    private javax.swing.JButton outputFileButton;
    private javax.swing.JLabel outputFileLabel;
    private javax.swing.JTextField outputFileTextField;
    private javax.swing.JButton smallFileButton;
    private javax.swing.JLabel smallFileLabel;
    private javax.swing.JTextField smallFileTextField;
    // End of variables declaration
}

ubiquibacon

Posted 2013-01-25T17:59:13.593

Reputation: 7 287

2I'll second that recommendation of Beyond Compare, best comparison tool on the market in my opinion. – wonea – 2013-01-25T18:50:27.720

Yep, Beyond Compare, while not perfect, is definitely one of the best. – Daniel R Hicks – 2013-01-26T18:04:08.260

I no like Java and remove it after latest warning (all sites I am using use Flash so no problem), but I will try on friend's laptop with Java. Thank you and I will accept, but would be nice to get vbscript or powershell answer from someone. – Angelo – 2013-01-28T03:06:39.197

The recent warnings about java only pertain to java 7. Just stick with java 6 for now. Also you will need the JDK to compile and run this code, not just a standard install of the JRE. – ubiquibacon – 2013-01-28T04:13:40.030

0

WinMerge is great for comparing text files.

Chris Nava

Posted 2013-01-25T17:59:13.593

Reputation: 7 009