How to keep only every nth line of a file

75

18

I've got a rather sizable CSV file (75MB). I'm just trying to produce a graph of it, so I really don't need all of the data.

Rewording: I'd like to delete n lines, then keep one line, then delete n lines, and so on.

So if the file looked like this:

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6

and n=2, then the output would be:

Line 3
Line 6

It seems like sed might be able to do this, but I haven't been able to figure out how. A bash command would be ideal, but I'm open to any solution.

Computerish

Posted 2012-03-03T17:20:17.047

Reputation: 853

2Do you really want lines 1, 3, 6, etc., rather than 1, 4, 7, etc.? – Ilmari Karonen – 2012-03-03T18:59:06.423

2Since it is a CSV file, I assume the first line contains meta data (i.e. field names.). If so, the question should be "every nth line after the first". – iglvzx – 2012-03-03T19:57:49.243

Oops. Can't believe I did that. – Computerish – 2012-03-03T21:38:20.623

81, 3, 6 still doesn't make sense! – wim – 2012-03-05T00:18:56.353

1I guess it should be 1, 3, 5 unless n=2 is a magic value for triangular numbers (1, 3, 6, 10, 15, 21 etc.) – rjmunro – 2012-03-07T10:32:26.417

5Can you update your question to make what you're asking for ("every nth line", "n=2") and your desired output (Line 3, Line 6) consistent? Future readers are going to be confused. – Keith Thompson – 2012-03-08T04:56:24.017

Answers

127

~ $ awk 'NR == 1 || NR % 3 == 0' yourfile
Line 1
Line 3
Line 6

NR (number of records) variable is records number of lines because default behavior is new line for RS (record seperator). pattern and action is optional in awk's default format 'pattern {actions}'. when we give only pattern part then awk writes all the fields $0 for our pattern's true conditions.

Selman Ulug

Posted 2012-03-03T17:20:17.047

Reputation: 1 396

I found that this approach leave me lines 1 and 2 untouch. This is confirmed with awk 'NR == 1 || NR % 2 == 0' myfile.txt | wc -l resulting in a odd number while the original file had an even number of lines. @kev answer works best in my test case. – Daniel Da Cunha – 2015-09-15T09:21:58.830

8Thanks to defaults, you don't even need that much: awk 'NR == 1 || NR % 3 == 0' – Kevin – 2012-03-03T20:39:00.520

@selman: If you like Kevin's solution, you might want to consider updating your answer. – Keith Thompson – 2012-03-03T21:45:52.410

4Care to explain why it does so? That way if someone wants to slightly tweak it, then hopefully your explanation will help them do so – Ivo Flipse – 2012-03-04T09:41:30.750

58

sed can also do this:

$ sed -n '1p;0~3p' input.txt
Line 1
Line 3
Line 6

man sed explains ~ as:

first~step Match every step'th line starting with line first. For example, ``sed -n 1~2p'' will print all the odd-numbered lines in the input stream, and the address 2~5 will match every fifth line, starting with the second. first can be zero; in this case, sed operates as if it were equal to step. (This is an extension.)

kev

Posted 2012-03-03T17:20:17.047

Reputation: 9 972

1@qed Explanation: 1p prints the first line, 0~3p prints every third line starting from line 3 (the 1p is thus required to print line 1). But note that the 0~3 is not standard but a GNU sed extension. – Arkku – 2015-07-22T20:45:22.310

"This is an extension." Which version are/were you using? – Victor – 2015-10-11T21:41:44.573

This answer helped me a lot for windows PowerShell. I broadened it like that: sed -n '1p;0~10p' '.\in.txt' > out.txt to print the reduced file into an output-file. – kimliv – 2018-06-03T23:21:03.723

6Could you explain this command? – qed – 2014-06-09T18:56:43.620

23

Perl can do this too:

while (<>) {
    print  if $. % 3 == 1;
}

This program will print the first line of its input, and every third line afterwards.

To explain it a bit, <> is the line input operator, which iterates over the input lines when used in a while loop like this. The special variable $. contains the number of lines read so far, and % is the modulus operator.

This code can be written even more compactly as a one-liner, using the -n and -e switches:

perl -ne 'print if $. % 3 == 1'  < input.txt  > output.txt

The -e switch takes a piece of Perl code to execute as a command line parameter, while the -n switch implicitly wraps the code in a while loop like the one shown above.


Edit: To actually get lines 1, 3, 6, 9, ... as in the example, rather than lines 1, 4, 7, 10, ... as I first assumed you wanted, replace $. % 3 == 1 with $. == 1 or $. % 3 == 0.

Ilmari Karonen

Posted 2012-03-03T17:20:17.047

Reputation: 1 509

7

If you want to do it with a Bash script you can try:

#!/bin/sh

echo Please enter the file name
read fname
echo Please enter the Nth lines that you want to keep
read n

exec<$fname
value=0
while read line
do
    if [ $(( $value % $n )) -eq 0 ] ; then
        echo -e "$line" >> new_file.txt
    fi
        let value=value+1 
done
echo "Check the 'new_file.txt' that has been created in this directory";

Save it as "read_lines.sh" and remember to give +x permissions to the bash file.

chmod +x ./read_lines.sh

akarpovsky

Posted 2012-03-03T17:20:17.047

Reputation: 71

1If you made this just emit on standard out, read the no of lines to skip from the arguments and read the file from standard in, it would be simpler and more useful. You could still make new_file.txt by doing ./read_lines.sh > new_file.txt. – rjmunro – 2012-03-07T10:36:40.687

4

A solution in pure bash, that does not spawn a process is:

{ for f in {1..2}; do read line; done;
  while read line; do
    echo $line;
    for f in {1..2}; do read line; done;
  done; } < file

The first line skip 2 lines at the beginning of file, and the while print the next line and skip 2 lines again.

If your file is small, this is a very efficient way of doing the job as it does not start a process. When your file is large, sed should be used as it is more efficient at handling io than bash.

jfg956

Posted 2012-03-03T17:20:17.047

Reputation: 1 021

1

A Python version (both Python 2 an Python 3):

python2 -c "print(''.join(open('file.txt').readlines()[::3]))"

replace [::3] with start, end and step size parameters for more control. E.g. [10:36:5] puts out lines 10,15,...,35.

Note, since readlines() keeps the line endings, the output of this call might end with an empty last line, unless the original last line gets put out by the chosen step size.

A stream version is possible, too (here output only after finished stream):

python -c "import sys;print(''.join(list(sys.stdin)[::3]))" < file.txt

DomTomCat

Posted 2012-03-03T17:20:17.047

Reputation: 111