Awk responds differently based on how an empty argument is specified


I seem to have stumbled on something which is probably a bug in awk, but it could also be a bug in my understanding of bash/awk.

I was trying to debug issues where the output of a python program was being piped to awk and I would get the following exception irrespective of what the awk command was doing.

close failed in file object destructor:
Error in sys.excepthook:

Original exception was:

As it turns out awk was getting passed an empty first argument, followed by -f awkfilename.awk. So the error can be reproduced by the following command line:

python -c 'print "hello"'  | awk '' 

But If I run awk without any arguments (which is what I would consider above to be the equivalent of), I get the awk help followed by the same exception

 python -c 'print "hh"'  | awk 

Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:      GNU long options:
    -f progfile     --file=progfile
    -F fs           --field-separator=fs
    -v var=val      --assign=var=val
    -m[fr] val
    -W compat       --compat
    -W copyleft     --copyleft
    -W copyright        --copyright
    -W dump-variables[=file]    --dump-variables[=file]
    -W exec=file        --exec=file
    -W gen-po       --gen-po
    -W help         --help
    -W lint[=fatal]     --lint[=fatal]
    -W lint-old     --lint-old
    -W non-decimal-data --non-decimal-data
    -W profile[=file]   --profile[=file]
    -W posix        --posix
    -W re-interval      --re-interval
    -W source=program-text  --source=program-text
    -W traditional      --traditional
    -W usage        --usage
    -W use-lc-numeric   --use-lc-numeric
    -W version      --version

To report bugs, see node `Bugs' in `', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

    gawk '{ sum += $1 }; END { print sum }' file
    gawk -F: '{ print $1 }' /etc/passwd
close failed in file object destructor:
Error in sys.excepthook:

Original exception was:

Note: the message after "Original Exception was:" is actually empty, its not something I have skipped.

Details about my system

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
[GCC 4.4.3] on linux2

$ awk --version
GNU Awk 3.1.6

$ cat /etc/lsb-release 

$ uname -a
Linux <hostname> 2.6.32-37-generic #81-Ubuntu SMP Fri Dec 2 20:32:42 UTC 2011 x86_64     GNU/Linux

I would be happy if someone could offer some insight. Ofcourse, the immediate solution is to sanitize the argument that gets passed as empty to awk which I have done, but this made me curious about the cause.


Based on comments below I udnerstand that awk and awk '' are different in that the second invocation means awk sees the number of arguments to be 1 (with the argument being empty string) instead 0.

What I still dont understand is what does the empty string as the awk expression do.

For e.g. the following works fine

$ echo "" > /tmp/empty.awk
$ python -c 'print "hello"' | awk -f /tmp/empty.awk
$ echo $?
$ 0


Posted 2012-02-04T02:16:26.240

Reputation: 133



There are two separate things going on here: the error messages (which are actually from python, not awk), and awk's usage message. To isolate them, just redirect stderr from both commands:

$ python -c 'print "hello"' 2>pyerr | awk 2>awkerr
$ cat pyerr 
close failed in file object destructor:
Error in sys.excepthook:

Original exception was:
$ cat awkerr 
usage: awk [-F fs] [-v var=value] [-f progfile | 'prog'] [file ...]

AIUI python is getting an error because the program its output is being piped to exits (& closes the pipe) before python writes to it. Here's an example using sleep 0 as a program that does nothing at all, and hence exits very fast:

$ python -c 'print "hello"' | sleep 0
close failed in file object destructor:
Error in sys.excepthook:

Original exception was:

But if I use sleep 1, there's no error because sleep doesn't close its end of the pipe until after python has finished writing to it. Your results may differ, depending on the exact timings involved.

Now, for the awk error. The difference is that awk without an argument is not valid because you must supply a program; since you ran it improperly, it tries to be helpful by printing a usage message to tell you how it should be run. On the other hand, awk '' is actually telling awk to run an empty script (''), which is perfectly valid (although not terribly useful), so no usage message is printed:

$ awk
usage: awk [-F fs] [-v var=value] [-f progfile | 'prog'] [file ...]
$ awk ''

Gordon Davisson

Posted 2012-02-04T02:16:26.240

Reputation: 28 538

So if I understand you correctly, you are implying if I run awk with an empty script, awk doesnt read STDIN which is why python writing to STDOUT encounters the error? If this were valid, then I should encounter the same error if I ran awk with a file that was empty, correct? Like python -c 'print "hello"' | awk -f /tmp/empty.awk . The empty awk script passed with -f doesnt reproduce the error. – Puneet – 2012-02-07T01:46:08.440

@Puneet it's a matter of timing (known as a race condition), depending on exactly how long the two commands take. Compare the results of { sleep 0; python -c 'print "hello"'; } | { sleep 1; awk -f /tmp/empty.awk; } vs. `{ sleep 1; python -c 'print "hello"'; } | { sleep 0; awk -f /tmp/empty.awk; }

– Gordon Davisson – 2012-02-07T03:13:54.603

Also, if your awk script contains some rules other than BEGIN {...}, then it will consume all stdin before exiting; so in that case too you should encounter no error. You could achieve this by using '{}' as your awk script; you don't achieve it by using the empty string ''. – dubiousjim – 2012-04-19T11:48:36.940


Calling a program with zero arguments (or parameters) is different from calling a program with one, empty, argument (or parameter).

To use some C code as an example:

#include <stdio.h>
int main(int argc, char** argv)
    printf("%d\n", argc); // print the number of arguments we've received
    return 0; // exit successfully

Running this program as example will print 1 - because the name of the program is always automatically passed and there are zero additional arguments. Running the program as example '' or example SomethingGoesHere will print 2, because there is the name of the program and either a blank parameter or SomethingGoesHere.

As awk expects at least 2 parameters (its name and something else), calling awk by itself without any arguments results in what you see above - the help being printed.

It's for this reason that you're able to align arguments properly. If you had a program that always required 3 arguments, but you wanted the second one to be blank, you couldn't simply omit it - the shell wouldn't know there was an argument that was omitted so it would pass the 2 arguments along to the program, and the program would have an error.


Posted 2012-02-04T02:16:26.240

Reputation: 757

Thanks, that clears up the confusion between awk and awk ''. But I still dont understand what is meant by an empty awk expression. I tried running
$ echo "" > /tmp/empty.awk $ python -c 'print "hello"' | awk -f /tmp/empty.awk $ echo $? $ 0

and that works – Puneet – 2012-02-07T02:53:26.753

@Puneet, the empty awk script is for these purposes like an awk script containing only 'BEGIN {}'; on most implementations (and I think POSIX requires this), such scripts won't consume stdin. See for example: echo test | { awk 'BEGIN {}'; echo done; cat; } – dubiousjim – 2012-04-19T11:49:42.883