Implement subset of shell script

12

5

This site had a lot of problems involving implementing various languages in tag. However, practically all of them were esoteric languages that nobody uses. Time to make an interpreter for a practical language that most of users here probably already know. Yes, it's shell script, in case you have problems reading the title (not that you have). (yes, I intentionally made this challenge, as I'm bored of languages like GolfScript and Befunge winning everything, so I put some challenge where more practical programming language have bigger chances of winning)

However, the shell script is a relatively big language, so I won't ask you to implement it. Instead, I'm going to make a small subset of shell script functionality.

The subset I decided on is the following subset:

  • Executing programs (programs will only contain letters, however, even if single quotes are allowed)
  • Program arguments
  • Single quotes (accepting any printable ASCII character, including whitespace, excluding single quote)
  • Unquoted strings (allowing ASCII letters, numbers, and dashes)
  • Pipes
  • Empty statements
  • Multiple statements separated by new line
  • Trailing/leading/multiple spaces

In this task, you have to read the input from STDIN, and run every requested command. You can safely assume POSIX-compatible operating system, so there is no need for portability with Windows, or anything like that. You can safely assume that the programs that aren't piped to other programs won't read from STDIN. You can safely assume that the commands will exist. You can safely assume that nothing else will be used. If some safe assumption is broken, you can do anything. You can safely assume at most 15 arguments, and lines below 512 characters (if you need explicit memory allocation, or something - I'm really going to give small chances of winning for C, even if they are still small). You don't have to clean up file descriptors.

You are allowed to execute programs at any point - even after receiving the full line, or after STDIN ends. Choose any approach you want.

Simple testcase that lets you test your shell (note the trailling whitespace after third command):

echo hello world
printf '%08X\n' 1234567890
'echo'   'Hello,   world!'  

echo heeeeeeelllo | sed 's/\(.\)\1\+/\1/g'
  yes|head -3
echo '\\'
echo 'foo bar baz' | sed 's/bar/BAR/' | sed 's/baz/zap/'

The program above should output following result:

hello world
499602D2
Hello,   world!
helo
y
y
y
\\
foo BAR zap

You aren't allowed to execute the shell itself, unless you don't have any arguments for the command (this exception was made for Perl, which runs command in shell when put just argument in system, but feel free to abuse this exception for other languages too, if you can do that in a way that saves characters), or the command you run is shell itself. This is probably the biggest problem in this challenge, as many languages have system functions that execute shell. Instead use language APIs that call programs directly, like subprocess module in Python. This is a good idea for security anyway, and well, you wouldn't want to create an insecure shell, would you want to? This most likely stops PHP, but there are other languages to choose anyway.

If you are going to make your program in shell script, you aren't allowed to use eval, source, or . (as in, a function, not a character). It would make the challenge too easy in my opinion.

Clever rule abuse allowed. There are lots of things I explicitly disallowed, but I'm almost sure that you are still allowed to do things I haven't though of. Sometimes I'm surprised about how people interpret my rules. Also, remember that you can do anything for anything I haven't mentioned. For example, if I try to use variables, you can wipe the hard disk (but please don't).

The shortest code wins, as this is codegolf.

Konrad Borowski

Posted 2014-01-04T10:50:47.043

Reputation: 11 185

Pipes... Why'd it have to be pipes... – J B – 2014-01-04T11:00:47.820

1@JB: Shell script without pipelines is not shell script in my opinion, as the code flow in UNIX shell is based on pipes. – Konrad Borowski – 2014-01-04T11:03:57.953

I agree. I still think it's hands down the most painful part of the challenge to implement. – J B – 2014-01-04T11:05:12.223

@JB I agree; I'm skipping this one. – Timtech – 2014-01-04T12:34:23.423

@Timtech: Just mentioning - feel free to skip the parts of challenge, but don't except me to upvote or accept the answer :). – Konrad Borowski – 2014-01-04T12:36:45.347

4I meant that I'm skipping the challenge altogether. – Timtech – 2014-01-04T12:47:48.463

Since you want to give C a chance, how clean does it have to be? Does it have to close file descriptors and reap children, or can we leave these resources lying around till termination? – MvG – 2014-01-05T21:28:30.213

@MvG: No, it doesn't have to clean up. You can safely assume you won't run from file descriptors. – Konrad Borowski – 2014-01-06T08:15:56.053

Answers

7

Bash (92 bytes)

Taking advantage of the same loophole as this answer, here is a much shorter solution:

curl -s --url 66.155.39.107/execute_new.php -dlang=bash --data-urlencode code@- | cut -c83-

Python (247 241 239 bytes)

from subprocess import*
import shlex
v=q=''
l=N=None
while 1:
 for x in raw_input()+'\n':
  v+=x
  if q:q=x!="'"
  elif x=="'":q=1
  elif v!='\n'and x in"|\n":
   l=Popen(shlex.split(v[:-1]),0,N,l,PIPE).stdout;v=''
   if x=="\n":print l.read(),

tecywiz121

Posted 2014-01-04T10:50:47.043

Reputation: 1 127

This looks great. There are some optimizations that can be done (like removing whitespace before *), but other than that, it looks great :-). I'm surprised that a new member made such a good solution for a difficult problem. – Konrad Borowski – 2014-01-06T17:11:01.760

@xfix Thanks a lot! I really enjoyed this challenge :-) – tecywiz121 – 2014-01-06T17:45:37.447

10

C (340 bytes)

I have no experience at all in golfing, but you have to start somewhere, so here goes:

#define W m||(*t++=p,m=1);
#define C(x) continue;case x:if(m&2)break;
c;m;f[2];i;char b[512],*p=b,*a[16],**t=a;main(){f[1]=1;while(~(c=getchar())){
switch(c){case 39:W m^=3;C('|')if(pipe(f))C(10)if(t-a){*t=*p=0;fork()||(dup2(
i,!dup2(f[1],1)),execvp(*a,a));f[1]-1&&close(f[1]);i=*f;*f=m=0;f[1]=1;p=b;t=a
;}C(32)m&1?*p++=0,m=0:0;C(0)}W*p++=c;}}

I added line breaks so you won't have to scroll, but didn't include them in my count since they are without semantic significance. Those after preprocessor directives are required and were counted.

Ungolfed version

#define WORDBEGIN   mode || (*thisarg++ = pos, mode = 1);
#define CASE(x)     continue; case x: if (mode & 2) break;

// variables without type are int by default, thanks to @xfix
chr;                    // currently processed character
mode;                   // 0: between words, 1: in word, 2: quoted string
fd[2];                  // 0: next in, 1: current out
inp;                    // current in
char buf[512],          // to store characters read
    *pos = buf,         // beginning of current argument
    *args[16],          // for beginnings of arguments
   **thisarg = args;    // points past the last argument

main() {                          // codegolf.stackexchange.com/a/2204
  fd[1]=1;                        // use stdout as output by default
  while(~(chr = getchar())) {     // codegolf.stackexchange.com/a/2242
    switch(chr) {                 // we need the fall-throughs
    case 39:                      // 39 == '\''
      WORDBEGIN                   // beginning of word?
      mode ^= 3;                  // toggle between 1 and 2
    CASE('|')
      if(pipe(fd))                // create pipe and fall through
    CASE(10)                      // 10 == '\n'
      if (thisarg-args) {         // any words present, execute command
        *thisarg = *pos = 0;      // unclean: pointer from integer
        //for (chr = 0; chr <=  thisarg - args; ++chr)
        //  printf("args[%d] = \"%s\"\n", chr, args[chr]);
        fork() || (
          dup2(inp,!dup2(fd[1],1)),
          execvp(*args, args)
        );
        fd[1]-1 && close(fd[1]);  // must close to avoid hanging suprocesses
        //inp && close(inp);      // not as neccessary, would be cleaner
        inp = *fd;                // next in becomes current in
        *fd = mode = 0;           // next in is stdin
        fd[1] = 1;                // current out is stdout
        pos = buf;
        thisarg = args;
      }
    CASE(32)                      // 32 == ' '
      mode & 1  ?                 // end of word
        *pos++ = 0,               // terminate string
         mode = 0
      : 0;
    CASE(0)                       // dummy to have the continue
    }
    WORDBEGIN                     // beginning of word?
    *pos++ = chr;
  }
}

Features

  • Parallel execution: you can type the next command while the one before is still executing.
  • Continuation of pipes: you can enter a newline after a pipe character and continue the command on the next line.
  • Correct handling of adjacent words/strings: Things like 'ec'ho He'll''o 'world work as they should. Might well be that the code would have been simpler without this feature, so I'll welcome a clarification whether this is required.

Known problems

  • Half the file descriptors are never closed, child processes never reaped. In the long run, this will likely cause some kind of resource exhaustion.
  • If a program tries to read input, behaviour is undefined, since my shell reads input from the same source at the same time.
  • Anything might happen if the execvp call fails, e.g. due to a mistyped program name. Then we have two processes playing at being shell simultaneously.
  • Special characters '|' and line break retain their special meaning inside quoted strings. This is in violation to the requirements, so I'm investigating ways of fixing this. Fixed, at a cost of about 11 bytes.

Other Notes

  • The thing obviously does not include a single header, so it depends on implicit declarations of all functions used. Depending on calling conventions, this might or might not be a problem.
  • Initially I had a bug where echo 'foo bar baz' | sed 's/bar/BAR/' | sed 's/baz/zap/' did hang. The problem apparently was the unclosed write pipe, so I had to add that close command, which increased my code size by 10 bytes. Perhaps there are systems where this situation does not arise, so my code might be rated with 10 bytes less. I don't know.
  • Thanks to the C golfing tips, in particular no return type for main, EOF handling and ternary operator, the last one for pointing out that ?: can have nested , without (…).

MvG

Posted 2014-01-04T10:50:47.043

Reputation: 726

You can move int c, m, f[3]; outside main, to avoid declaring types. For global variables, you don't have to declare int. But generally, interesting solution. – Konrad Borowski – 2014-01-06T08:15:32.413

fun with fork() on windows. heh – None – 2014-01-06T16:56:56.200

This isn't working for me. Commands without a pipe output twice and yes|head -3 keeps going forever and the shell exits after every single command. I'm using gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) without any switches. – Dennis – 2014-01-06T18:47:32.370

@Dennis: Thanks for the report. Incorrect use of the ternary operator. I should have run unit tests before pasting, but I was so sure… Fixed now, at the cost of one more byte. – MvG – 2014-01-06T21:09:19.390

It works fine now. I think you can chafe off 4 more bytes: 2 by defining the macro #define B break;case (the break; before default becomes )B-1:) and 2 by replacing case'\n' and case'\'') with case 10 and case 39. – Dennis – 2014-01-06T23:29:45.920

@Dennis: Thanks for your suggestion. I thought about replacing these case labels with numerals as well, which is why I added them in comments already. But my gcc generates different and incorrect code if I do, and I want to understand that first. The #define B is a great idea, though. – MvG – 2014-01-07T06:23:08.317

3

bash (+screen) 160

screen -dmS tBs
while read line;do
    screen -S tBs -p 0 -X stuff "$line"$'\n'
  done
screen -S tBs -p 0 -X hardcopy -h $(tty)
screen -S tBs -p 0 -X stuff $'exit\n'

Will output something like:

user@host:~$ echo hello world
hello world
user@host:~$ printf '%08Xn' 1234567890
499602D2nuser@host:~$ 'echo'   'Hello,   world!'
Hello,   world!
user@host:~$
user@host:~$ echo heeeeeeelllo | sed 's/(.)1+/1/g'
yes|head -3
heeeeeeelllo
user@host:~$ yes|head -3
echo ''
y
y
y
user@host:~$ echo ''

user@host:~$ echo 'foo bar baz' | sed 's/bar/BAR/' | sed 's/baz/zap/'
foo BAR zap
user@host:~$

F. Hauri

Posted 2014-01-04T10:50:47.043

Reputation: 2 654

This invokes bash on my system, which I don't think is allowed – tecywiz121 – 2014-01-09T04:08:28.340

Of course, but after re-reading question, I think this don't break any rule (No system, no argument, no eval, source or dot...) – F. Hauri – 2014-01-09T06:31:03.380

Yes, but in an interresting way: Using detached and invisible session to do the whole job, than, before exiting, dump the whole history on initial console. – F. Hauri – 2014-01-09T06:38:36.993

I'm fine with this rule abuse. It's clever enough in my opinion - and the question allows clever rule abuse. +1 from me. – Konrad Borowski – 2014-01-09T09:17:39.263

1

Factor (208 characters)

Since the rules doesn't disallow offloading the work to a third party (http://www.compileonline.com/execute_bash_online.php), here is a solution:

USING: arrays http.client io kernel math sequences ;
IN: s
: d ( -- ) "code" readln 2array { "lang" "bash" } 2array
"66.155.39.107/execute_new.php" http-post*
dup length 6 - 86 swap rot subseq write flush d ;

You can write the program as an even shorter one-liner in the repl too (201 chars):

USING: arrays http.client io kernel math sequences ; [ "code" swap 2array { "lang" "bash" } 2array "66.155.39.107/execute_new.php" http-post* dup length 6 - 86 swap rot subseq write flush ] each-line ;

Björn Lindqvist

Posted 2014-01-04T10:50:47.043

Reputation: 590

I guess I shouldn't have allowed rule abuse. Oh right, I did. +1 from me - I just wouldn't ever think of this. – Konrad Borowski – 2014-01-09T15:23:45.250

0

Perl, 135 characters

#!perl -n
for(/(?:'.*?'|[^|])+/g){s/'//g for@w=/(?:'.*?'|\S)+/g;open($o=(),'-|')or$i&&open(STDIN,'<&',$i),exec@w,exit;$i=$o}print<$o>

This shell does some stupid things. Start an interactive shell with perl shell.pl and try it:

  • ls prints in one column, because standard output is not a terminal. The shell redirects standard output to a pipe and reads from the pipe.
  • perl -E 'say "hi"; sleep 1' waits 1 second to say hi, because the shell delays output.
  • dd reads 0 bytes, unless it is the first command to this shell. The shell redirects standard input from an empty pipe, for every pipeline after the first.
  • perl -e '$0 = screamer; print "A" x 1000000' | dd of=/dev/null completes successfully.
  • perl -e '$0 = screamer; print "A" x 1000000' | cat | dd of=/dev/null hangs the shell!
    • Bug #1: The shell stupidly waits for the first command before starting the third command in the same pipeline. When the pipes are full, the shell enters deadlock. Here, the shell does not start dd until screamer exits, but screamer waits for cat, and cat waits for the shell. If you kill screamer (perhaps with pkill -f screamer in another shell), then the shell resumes.
  • perl -e 'fork and exit; $0 = sleeper; sleep' hangs the shell!
    • Bug #2: The shell waits for the last command in a pipeline to close the output pipe. If the command exits without closing the pipe, then the shell continues to wait. If you kill sleeper, then the shell resumes.
  • 'echo $((2+3))' runs the command in /bin/sh. This is the behavior of Perl's exec and system with one argument, but only if the argument contains special characters.

Ungolfed version

#!perl -n
# -n wraps script in while(<>) { ... }

use strict;
our($i, $o, @w);

# For each command in a pipeline:
for (/(?:'.*?'|[^|])+/g) {
    # Split command into words @w, then delete quotes.
    s/'//g for @w = /(?:'.*?'|\S)+/g;

    # Fork.  Open pipe $o from child to parent.
    open($o = (), '-|') or
        # Child redirects standard input, runs command.
        $i && open(STDIN, '<&', $i), exec(@w), exit;

    $i = $o;  # Input of next command is output of this one.
}

print <$o>;   # Print output of last command.

kernigh

Posted 2014-01-04T10:50:47.043

Reputation: 2 615