48
7
if i have a .gz file on unix which has certain number of lines. How could i count the lines on unix without uncompressing it.
48
7
if i have a .gz file on unix which has certain number of lines. How could i count the lines on unix without uncompressing it.
66
You can obviously not count newlines if the file is still compressed.
But you can decompress to a stream, and count the newlines in that stream, without ever writing the (decompressed) file to disk. That would go something like so:
zcat file.gz | wc -l
zcat for decompress & cat, wc for wordcount. See man pages for both if you want to know more.
EDIT
If you do not have zcat, zcat is just another name for gunzip -c
.
7On Unices where gzip
is distinct from compress
, you want gzcat
. – coneslayer – 2010-04-27T21:56:05.393
8
This also seems to work - grep for the number of line-endings in the file
zgrep -Ec "$" file.gz
This gives a different (much higher) answer for me than piping to wc -l
– OrangeDog – 2018-03-09T17:03:42.213
6
If you want to do it quickly, I recommend using 'pigz' (which IIRC stands for "Parallel Implementation of GZip"). I just had a similar situation where I wanted to count the number of lines in a bunch of gzip'ed files and here was my solution:
for x in *.gz; do unpigz -p 8 -c $x | wc -l && echo $x; done
Which gave me the number of lines and the file it counted from on alternating lines, using 8 processors. It ran quickly!
1Or if unpigz is not available, simply with for x in *.fastq.gz; do zcat "$x" | wc -l && echo $x; done
– Calimo – 2015-11-20T22:34:56.127
2
Use this command:
gzgrep -c $ filename.gz
The command gzgrep
behaves the same as grep
but on gzip compressed files. It decompress the file on the fly for the regex matching.
In this case -c
instruct the command to output number of matched lines and the regex $
matches end of line so it matches every line or the file.
The final result is identical to gzip -dc filename.gz | grep -c $
.
Is gzgrep
available on other systems than Solaris? – pabouk – 2014-11-21T09:43:46.183
1No. On other systems, command would be zgrep -c $ filename.gz – Ravi K M – 2016-05-11T08:08:14.280
1Although one might intuitively think this is better than zcat+wc, when I time them, they take the same amount of time. – ngọcminh.oss – 2018-05-10T12:05:33.910
2
If you're okay with a rough estimate rather than an exact count, and actually extracting the whole file or zgrepping it for line endings would both take much too long (which was my situation just now), you can:
zcat "$file" | head -1000 > 1000-line-sample.txt
ls -ls 1000-line-sample.txt "$file"
then the approximate line count is 1000 * (size of $file) / (size of 1000-line-sample)
, as long as your data is fairly homogeneous per line.
Can you explain why this works? – Alex Moore-Niemi – 2020-02-24T02:09:00.143
0
gzip -cd <file.gz> | wc -l
This worked for me.
See http://stackoverflow.com/questions/846062/wc-gzipped-files
– sancho.s Reinstate Monica – 2015-11-07T12:48:10.243Without extracting the archive you can't count the lines. – zoli2k – 2010-04-27T07:38:32.757