6

I have tried this shell script on a SUSE 10 server, kernel 2.6.16.60, ext3 filesystem

the script has problem like this:

cat file | awk '{print $1" "$2" "$3}' | sort -n > result

the file's size is about 3.2G, and I get such error message: File size limit exceeded

in this shell, ulimit -f is unlimited

after I change script into this

cat file | awk '{print $1" "$2" "$3}' >tmp
sort -n tmp > result

the problem is gone.

I don't know why, can anyone help me with an explanation?

quanta
  • 50,327
  • 19
  • 152
  • 213
yboren
  • 61
  • 1

1 Answers1

2

The pipe version needs many more temporary files. You can inspect this quickly with the strace utility.

The pipe version use a quick exploding number of temporary files:

for i in {1..200000} ; do echo $i ; done |strace sort -n |& grep -e 'open.*/tmp/'
open("/tmp/sortb9Mhqd", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
open("/tmp/sortqKOVvG", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
open("/tmp/sortb9Mhqd", O_RDONLY)       = 3
open("/tmp/sortqKOVvG", O_RDONLY)       = 4

The file version doesn't use temporary files for the same data set. For bigger data sets it use extremely less temporary files.

for i in {1..200000} ; do echo $i ; done >/tmp/TESTDATA ; strace sort -n /TMP/TESTDATA |& grep -e 'open.*/tmp/'
H.-Dirk Schmitt
  • 644
  • 4
  • 9
  • I think it's not the point, sort will use many temporary files if the sorted file is big ,but each temporary file's size is not big enough to cause this error. – yboren Sep 21 '12 at 04:03