Background

For my code-golf submissions in C, I need a processing tool. Like in many other languages, whitespace is mostly irrelevant in C source (but not always!) -- still makes the code much more comprehensible for humans. A fully golfed C program that doesn't contain a single redundant whitespace often is barely readable.

Therefore, I like to write my code in C for a code-golf submission including whitespace and sometimes comments, so the program keeps a comprehensible structure while writing. The last step is to remove all comments and redundant whitespace. This is a tedious and mindless task which really should be done by ~~an intern~~ a computer program.

Task

Write a program or function that eliminates comments and redundant whitespace from some "pre-golfed" C source according to the following rules:

A \ (backslash) as the very last character in a line is a line continuation. If you find this, you must treat the following line as part of the same logical line (you could for example remove the \ and the following \n (newline) completely before doing anything else)
Comments will only use the one-line format, starting with //. So to remove them, you ignore the rest of the logical line wherever you encounter // outside a string literal (see below).
Whitespace characters are (space), \t (tab) and \n (newline, so here the end of a logical line).
When you find a sequence of whitespace, examine the non-whitespace characters surrounding it. If
- both of them are alphanumeric or underscore (range [a-zA-Z0-9_]) or
- both are + or
- both are - or
- the preceeding one is / and the following one is *
then replace the sequence with a single space () character.

Otherwise, eliminate the sequence completely.

This rule has some exceptions:
- Preprocessor directives must appear on their own lines in your output. A preprocessor directive is a line starting with #.
- Inside a string literal or character literal, you shouldn't remove any whitespace. Any " (double-quote) / ' (single-quote) that isn't directly preceeded by an odd number of backslashes (\) starts or ends a string literal / character literal. You're guaranteed that string and character literals end on the same line they started. string literals and character literals cannot be nested, so a ' inside a string literal, as well as a " inside a character literal don't have any special meaning.

I/O specification

Input and output must be either character sequences (strings) including newline characters or arrays/lists of strings that don't contain newline characters. If you choose to use arrays/lists, each element represents a line, so the newlines are implicit after each element.

You may assume the input is a valid C program source code. This also means it only contains printable ASCII characters, tabs and newlines. Undefined behavior on malformed input is allowed.

Leading and trailing whitespace / empty lines are not allowed.

Test cases

input

main() {
    printf("Hello, World!"); // hi
}

output

main(){printf("Hello, World!");}

input

#define max(x, y) \
    x > y ? x : y
#define I(x) scanf("%d", &x)
a;
b; // just a needless comment, \
        because we can!
main()
{
    I(a);
    I(b);
    printf("\" max \": %d\n", max(a, b));
}

output

#define max(x,y)x>y?x:y
#define I(x)scanf("%d",&x)
a;b;main(){I(a);I(b);printf("\" max \": %d\n",max(a,b));}

input

x[10];*c;i;
main()
{
    int _e;
    for(; scanf("%d", &x) > 0 && ++_e;);
    for(c = x + _e; c --> x; i = 100 / *x, printf("%d ", i - --_e));
}

output

x[10];*c;i;main(){int _e;for(;scanf("%d",&x)>0&&++_e;);for(c=x+_e;c-->x;i=100/ *x,printf("%d ",i- --_e));}

input

x;
#include <stdio.h>
int main()
{
    puts("hello // there");
}

output

x;
#include<stdio.h>
int main(){puts("hello // there");}

input (a real-world example)

// often used functions/keywords:
#define P printf(
#define A case
#define B break

// loops for copying rows upwards/downwards are similar -> macro
#define L(i, e, t, f, s) \
        for (o=i; o e;){ strcpy(l[o t], l[o f]); c[o t]=c[s o]; }

// range check for rows/columns is similar -> macro
#define R(m,o) { return b<1|b>m ? m o : b; }

// checking for numerical input is needed twice (move and print command):
#define N(f) sscanf(f, "%d,%d", &i, &j) || sscanf(f, ",%d", &j)

// room for 999 rows with each 999 cols (not specified, should be enough)
// also declare "current line pointers" (*L for data, *C for line length),
// an input buffer (a) and scratch variables
r, i, j, o, z, c[999], *C, x=1, y=1;
char a[999], l[999][999], (*L)[999];

// move rows down from current cursor position
D()
{
    L(r, >y, , -1, --)
    r++ ? strcpy(l[o], l[o-1]+--x), c[o-1]=x, l[o-1][x]=0 : 0;
    c[y++] = strlen(l[o]);
    x=1;
}

// move rows up, appending uppermost to current line
U()
{
    strcat(*L, l[y]);
    *C = strlen(*L);
    L(y+1, <r, -1, , ++)
    --r;
    *l[r] = c[r] = 0;
}

// normalize positions, treat 0 as max
X(b) R(c[y-1], +1)
Y(b) R(r, )

main()
{
    for(;;) // forever
    {
        // initialize z as current line index, the current line pointers,
        // i and j for default values of positioning
        z = i = y;
        L = l + --z;
        C = c + z;
        j = x;

        // prompt:
        !r || y/r && x > *C
            ? P "end> ")
            : P "%d,%d> ", y, x);

        // read a line of input (using scanf so we don't need an include)
        scanf("%[^\n]%*c", a)

            // no command arguments -> make check easier:
            ? a[2] *= !!a[1],

            // numerical input -> have move command:
            // calculate new coordinates, checking for "relative"
            N(a)
                ? y = Y(i + (i<0 | *a=='+') * y)
                    , x = X(j + (j<0 || strchr(a+1, '+')) * x)
                :0

            // check for empty input, read single newline
            // and perform <return> command:
            : ( *a = D(), scanf("%*c") );

        switch(*a)
        {
            A 'e':
                y = r;
                x = c[r-1] + 1;
                B;

            A 'b':
                y = 1;
                x = 1;
                B;

            A 'L':
                for(o = y-4; ++o < y+2;)
                    o<0 ^ o<r && P "%c%s\n", o^z ? ' ' : '>', l[o]);
                for(o = x+1; --o;)
                    P " ");
                P "^\n");
                B;

            A 'l':
                puts(*L);
                B;

            A 'p':
                i = 1;
                j = 0;
                N(a+2);
                for(o = Y(i)-1; o<Y(j); ++o)
                    puts(l[o]);
                B;

            A 'A':
                y = r++;
                strcpy(l[y], a+2);
                x = c[y] = strlen(a+2);
                ++x;
                ++y;
                B;

            A 'i':
                D();
                --y;
                x=X(0);
                // Commands i and r are very similar -> fall through
                // from i to r after moving rows down and setting
                // position at end of line:

            A 'r':
                strcpy(*L+x-1, a+2);
                *C = strlen(*L);
                x = 1;
                ++y > r && ++r;
                B;

            A 'I':
                o = strlen(a+2);
                memmove(*L+x+o-1, *L+x-1, *C-x+1);
                *C += o;
                memcpy(*L+x-1, a+2, o);
                x += o;
                B;

            A 'd':
                **L ? **L = *C = 0, x = 1 : U();
                y = y>r ? r : y;
                B;

            A 'j':
                y<r && U();
        }
    }
}

output

#define P printf(
#define A case
#define B break
#define L(i,e,t,f,s)for(o=i;o e;){strcpy(l[o t],l[o f]);c[o t]=c[s o];}
#define R(m,o){return b<1|b>m?m o:b;}
#define N(f)sscanf(f,"%d,%d",&i,&j)||sscanf(f,",%d",&j)
r,i,j,o,z,c[999],*C,x=1,y=1;char a[999],l[999][999],(*L)[999];D(){L(r,>y,,-1,--)r++?strcpy(l[o],l[o-1]+--x),c[o-1]=x,l[o-1][x]=0:0;c[y++]=strlen(l[o]);x=1;}U(){strcat(*L,l[y]);*C=strlen(*L);L(y+1,<r,-1,,++)--r;*l[r]=c[r]=0;}X(b)R(c[y-1],+1)Y(b)R(r,)main(){for(;;){z=i=y;L=l+--z;C=c+z;j=x;!r||y/r&&x>*C?P"end> "):P"%d,%d> ",y,x);scanf("%[^\n]%*c",a)?a[2]*=!!a[1],N(a)?y=Y(i+(i<0|*a=='+')*y),x=X(j+(j<0||strchr(a+1,'+'))*x):0:(*a=D(),scanf("%*c"));switch(*a){A'e':y=r;x=c[r-1]+1;B;A'b':y=1;x=1;B;A'L':for(o=y-4;++o<y+2;)o<0^o<r&&P"%c%s\n",o^z?' ':'>',l[o]);for(o=x+1;--o;)P" ");P"^\n");B;A'l':puts(*L);B;A'p':i=1;j=0;N(a+2);for(o=Y(i)-1;o<Y(j);++o)puts(l[o]);B;A'A':y=r++;strcpy(l[y],a+2);x=c[y]=strlen(a+2);++x;++y;B;A'i':D();--y;x=X(0);A'r':strcpy(*L+x-1,a+2);*C=strlen(*L);x=1;++y>r&&++r;B;A'I':o=strlen(a+2);memmove(*L+x+o-1,*L+x-1,*C-x+1);*C+=o;memcpy(*L+x-1,a+2,o);x+=o;B;A'd':**L?**L=*C=0,x=1:U();y=y>r?r:y;B;A'j':y<r&&U();}}}

This is code-golf, so shortest (in bytes) valid answer wins.

Felix Palmen

Posted 2017-11-09T13:47:52.110

Reputation: 3 866

related, related – HyperNeutrino – 2017-11-09T14:01:10.060

Let us continue this discussion in chat.

– DLosc – 2017-11-09T18:18:20.747

Answers

Pip, 148 135 133 138 bytes

aRM"\
"R`("|').*?(?<!\\)(\\\\)*\1`{lPBaC:++i+191}R[`//.*``#.*`{X*aJw.`(?=`}.')M[A`\w`RL2"++""--""/*"]w`¶+`'·C(192+,#l)][x_WR'¶{aRw'·}xnsl]

Bytes are counted in CP-1252, so ¶ and · are one byte each. Note that this expects the C code as a single command-line argument, which (on an actual command line) would require the use of copious escape sequences. It's much easier at Try it online!

Explanation of slightly ungolfed version

The code does a bunch of replacement operations, with a couple tricks.

Backslash continuations

We RM all occurrences of the literal string

"\
"

that is, backslash followed by newline.

String and character literals

We use a regex replacement with a callback function:

`("|').*?(?<!\\)(\\\\)*\1`

{
 lPBa
 C(++i + 191)
}

The regex matches a single or double quote, followed by a non-greedy .*? that matches 0 or more characters, as few as possible. We have a negative lookbehind to ensure that the previous character was not a backslash; then we match an even number of backslashes followed by the opening delimiter again.

The callback function takes the string/character literal and pushes it to the back of the list l. It then returns a character starting with character code 192 (À) and increasing with each literal replaced. Thus, code is transformed like so:

printf("%c", '\'');

printf(À, Á);

These replacement characters are guaranteed not to occur in the source code, which means we can unambiguously back-substitute them later.

Comments

`//.*`

x

The regex matches // plus everything up to the newline and replaces with x (preset to the empty string).

Preprocessor directives

`#.*`

_WR'¶

Wraps runs of non-newline characters starting with a pound sign in ¶.

Spaces that should not be eliminated

{
 (
  X*a J w.`(?=`
 ) . ')
}
M
[
 A`\w` RL 2
 "++"
 "--"
 "/*"
]

{
 a R w '·
}

There's a lot going on here. The first part generates this list of regexes to replace:

[
 `(?a)\w\s+(?=(?a)\w)`  Whitespace surrounded by [a-zA-Z_]
 `\+\s+(?=\+)`          Whitespace surrounded by +
 `\-\s+(?=\-)`          Whitespace surrounded by -
 `\/\s+(?=\*)`          Whitespace surrounded by / *
]

Note the use of lookaheads to match, for example, just the e in define P printf. That way this match doesn't consume the P, which means the next match can use it.

We generate this list of regexes by mapping a function to a list, where the list contains

[
 [`(?a)\w` `(?a)\w`]
 "++"
 "--"
 "/*"
]

and the function does this to each element:

(X*aJw.`(?=`).')
 X*a              Map unary X to elements/chars a: converts to regex, escaping as needed
                  Regexes like `\w` stay unchanged; strings like "+" become `\+`
    J             Join the resulting list on:
     w             Preset variable for `\s+`
      .`(?=`       plus the beginning of the lookahead syntax
(           ).')  Concatenate the closing paren of the lookahead

Once we have our regexes, we replace their occurrences with this callback function:

{aRw'·}

which replaces the run of whitespace in each match with ·.

Whitespace elimination and cleanup

[w `¶+` '·]

[x n s]

Three successive replacements substitute remaining runs of whitespace (w) for empty string (x), runs of ¶ for newline, and · for space.

Back-substitution of string and character literals

C(192+,#l)

l

We construct a list of all the characters we used as substitutions for literals by taking 192 + range(len(l)) and converting to characters. We then can replace each of these with its associated literal in l.

And that's it! The resulting string is autoprinted.

DLosc

Posted 2017-11-09T13:47:52.110

Reputation: 21 213

Great, I'm impressed (+1)! Including a // inside a string literal is definitely a good idea for a test case, I'll add one tomorrow. – Felix Palmen – 2017-11-10T01:01:43.773

Uhm ... now I found a subtle bug here as well...

– Felix Palmen – 2017-11-10T09:37:16.520

I'm going to pick a winner after 14 days (end of next week) and your solution would be the first candidate if you find time to fix this bug. Right now, you have the lowest score :) – Felix Palmen – 2017-11-16T10:39:48.827

1@FelixPalmen Fixed! – DLosc – 2017-11-16T19:51:57.437

Haskell, 327 360 418 394 bytes

g.(m.w.r.r=<<).lines.f
n:c:z="\n#_0123456789"++['A'..'Z']++['a'..'z']
(!)x=elem x
f('\\':'\n':a)=f a
f(a:b)=a:f b
f a=a
m('#':a)=c:a++[n]
m a=a
g(a:'#':b)=a:[n|a/=n]++c:g b
g(a:b)=a:g b
g a=a
s=span(!" \t")
r=reverse.snd.s
l n(a:b)d|a==d,n=a:w(snd$s b)|1>0=a:l(not$n&&a=='\\')b d
w('/':'/':_)=[]
w(a:b)|a!"\"'"=a:l(1>0)b a|(p,q:u)<-s b=a:[' '|p>"",a!z&&q!z||[a,q]!words"++ -- /*"]++w(q:u)
w a=a

Try it online!

This was a lot of fun to write! First the f function comes through and removes all the backslashes at the end of lines then lines breaks it into a list of strings at the newlines. Then we map a bunch of functions onto the lines and concatenate them all back together. Those functions: strip whitespace from the left (t) and from the right (r.t.r where r is reverse); remove whitespace from the middle, ignoring string and character literals as well as removing comments (w); and finally adds a newline character to the end if the line starts with a #. After all the lines are concatenated back together g looks for # characters and ensures they are preceded by a newline.

w is a little complex so I'll explain it further. First I check for "//" since in w I know I'm not in a string literal I know this is a comment so I drop the rest of the line. Next I check if the head is a delimiter for a string or character literal. If it is I prepend it and pass the baton to l which runs through the characters, tracking the "escape" state with n which will be true if there have been an even number of consecutive slashes. When l detects a delimiter and isn't in the escape state it passes the baton back to w, trimming to eliminate whitespace after the literal because w expects the first character to not be whitespace. When w doesn't find a delimiter it uses span to look for whitespace in the tail. If there's any it checks if the characters around it are cannot be brought into contact and inserts a space if so. Then it recurs after the whitespace is ended. If there was no whitespace no space is inserted and it moves on anyway.

EDIT: Thanks a lot to @DLosc for pointing out a bug in my program that actually led to a way for me to shorten it as well! Hooray for pattern matching!

EDIT2: I'm an idiot who didn't finish reading the spec! Thanks again DLosc for pointing that out!

EDIT3: Just noticed some annoying type reduction thing that turned e=elem into Char->[Char]->Bool for some reason, thus breaking on e[a,q]. I had to add a type signature to force it to be correct. Does anyone know how I could fix that? I've never had this problem in Haskell before. TIO

EDIT4: quick fix for bug @FelixPalmen showed me. I might try to golf it down later when I have some time.

EDIT5: -24 bytes thanks to @Lynn! Thank you! I didn't know you could assign things on the global scope using pattern matching like n:c:z=... that's really cool! Also good idea making an operator for elem wish I'd thought of that.

user1472751

Posted 2017-11-09T13:47:52.110

Reputation: 1 511

I squeezed out 24 bytes.

– Lynn – 2017-11-10T20:20:00.480

You are running into the dreaded monomorphism restriction; defining e x y=elem x y (or even e x=elem x) solves your problem. (I renamed e to an operator, (!).)

– Lynn – 2017-11-10T20:22:06.493

C, 497 494 490 489 bytes

Since we're processing C, let's do it using C! Function f() takes input from char pointer p and outputs to pointer q, and assumes that the input is in ASCII:

#define O*q++
#define R (r=*p++)
#define V(c)(isalnum(c)||c==95)
char*p,*q,r,s,t;d(){isspace(r)?g():r==47&&*p==r?c(),g():r==92?e():(O=s=r)==34?b():r==39?O=R,a():r?a():(O=r);}a(){R;d();}b(){((O=R)==34?a:r==92?O=R,b:b)();}c(){while(R-10)p+=r==92;}e(){R-10?s=O=92,O=r,a():h();}j(){(!isspace(R)?r==47&&*p==r?c(),j:(t=r==35,d):j)();}f(){t=*p==35;j();}i(){V(s)&&V(r)||s==47&&r==42||(s==43||s==45)&&r==s&&*p==s?O=32:0;d();}h(){isspace(R)?g():i();}g(){(r==10?t?O=r,j:*p==35?s-10?s=O=r,j:0:h:h)();}

We assume that the file is well-formed - string and character literals are closed, and if there's a comment on the final line, there must be a newline to close it.

Explanation

The pre-golfed version is only slightly more legible, I'm afraid:

#define O *q++=
#define R (r=*p++)
#define V(c)(isalnum(c)||c=='_')
char*p,*q,r,s,t;
d(){isspace(r)?g():r=='/'&&*p==r?c(),g():r=='\\'?e():(O s=r)=='"'?b():r=='\''?O R,a():r?a():(O r);}
a(){R;d();}
b(){((O R)=='"'?a:r=='\\'?O R,b:b)();}
c(){while(R!='\n')p+=r=='\\';}
e(){R!='\n'?s=O'\\',O r,a():h();}
j(){(!isspace(R)?r=='/'&&*p==r?c(),j:(t=r=='#',d):j)();}
f(){t=*p=='#';j();}
i(){V(s)&&V(r)||s=='/'&&r=='*'||(s=='+'||s=='-')&&r==s&&*p==s?O' ':0;d();}
h(){isspace(R)?g():i();}
g(){(r=='\n'?t?O r,j:*p=='#'?s!='\n'?s=O r,j:0:h:h)();}

It implements a state machine by tail recursion. The helper macros and variables are

O for output
R to read input into r
V to determine valid identifier characters (since !isalnum('_'))
p and q - I/O pointers as described
r - last character to be read
s - saved recent non-whitespace character
t - tag when working on a preprocessor directive

Our states are

a() - normal C code
b() - string literal
c() - comment
d() - normal C code, after reading r
e() - escape sequence
f() - initial state (main function)
g() - in whitespace
h() - in whitespace - dispatch to g() or i()
i() - immediately after whitespace - do we need to insert a space character?
j() - initial whitespace - never insert a space character

Test program

#define DEMO(code)                              \
    do {                                        \
        char in[] = code;                       \
        char out[sizeof in];                    \
        p=in;q=out;f();                         \
        puts("vvvvvvvvvv");                     \
        puts(out);                              \
        puts("^^^^^^^^^^");                     \
    } while (0)

#include<stdio.h>
#include<stdlib.h>
int main()
{
    DEMO(
         "main() {\n"
         "    printf(\"Hello, World!\"); // hi\n"
         "}\n"
         );
    DEMO(
         "#define max(x, y)                               \\\n"
         "    x > y ? x : y\n"
         "#define I(x) scanf(\"%d\", &x)\n"
         "a;\n"
         "b; // just a needless comment, \\\n"
         "        because we can!\n"
         "main()\n"
         "{\n"
         "    I(a);\n"
         "    I(b);\n"
         "    printf(\"\\\" max \\\": %d\\n\", max(a, b));\n"
         "}\n"
         );
    DEMO(
         "x[10];*c;i;\n"
         "main()\n"
         "{\n"
         "    int _e;\n"
         "    for(; scanf(\"%d\", &x) > 0 && ++_e;);\n"
         "    for(c = x + _e; c --> x; i = 100 / *x, printf(\"%d \", i - --_e));\n"
         "}\n"
         );
    DEMO(
         "// often used functions/keywords:\n"
         "#define P printf(\n"
         "#define A case\n"
         "#define B break\n"
         "\n"
         "// loops for copying rows upwards/downwards are similar -> macro\n"
         "#define L(i, e, t, f, s) \\\n"
         "        for (o=i; o e;){ strcpy(l[o t], l[o f]); c[o t]=c[s o]; }\n"
         "\n"
         "// range check for rows/columns is similar -> macro\n"
         "#define R(m,o) { return b<1|b>m ? m o : b; }\n"
         "\n"
         "// checking for numerical input is needed twice (move and print command):\n"
         "#define N(f) sscanf(f, \"%d,%d\", &i, &j) || sscanf(f, \",%d\", &j)\n"
         "\n"
         "// room for 999 rows with each 999 cols (not specified, should be enough)\n"
         "// also declare \"current line pointers\" (*L for data, *C for line length),\n"
         "// an input buffer (a) and scratch variables\n"
         "r, i, j, o, z, c[999], *C, x=1, y=1;\n"
         "char a[999], l[999][999], (*L)[999];\n"
         "\n"
         "// move rows down from current cursor position\n"
         "D()\n"
         "{\n"
         "    L(r, >y, , -1, --)\n"
         "    r++ ? strcpy(l[o], l[o-1]+--x), c[o-1]=x, l[o-1][x]=0 : 0;\n"
         "    c[y++] = strlen(l[o]);\n"
         "    x=1;\n"
         "}\n"
         "\n"
         "// move rows up, appending uppermost to current line\n"
         "U()\n"
         "{\n"
         "    strcat(*L, l[y]);\n"
         "    *C = strlen(*L);\n"
         "    L(y+1, <r, -1, , ++)\n"
         "    --r;\n"
         "    *l[r] = c[r] = 0;\n"
         "}\n"
         "\n"
         "// normalize positions, treat 0 as max\n"
         "X(b) R(c[y-1], +1)\n"
         "Y(b) R(r, )\n"
         "\n"
         "main()\n"
         "{\n"
         "    for(;;) // forever\n"
         "    {\n"
         "        // initialize z as current line index, the current line pointers,\n"
         "        // i and j for default values of positioning\n"
         "        z = i = y;\n"
         "        L = l + --z;\n"
         "        C = c + z;\n"
         "        j = x;\n"
         "\n"
         "        // prompt:\n"
         "        !r || y/r && x > *C\n"
         "            ? P \"end> \")\n"
         "            : P \"%d,%d> \", y, x);\n"
         "\n"
         "        // read a line of input (using scanf so we don't need an include)\n"
         "        scanf(\"%[^\\n]%*c\", a)\n"
         "\n"
         "            // no command arguments -> make check easier:\n"
         "            ? a[2] *= !!a[1],\n"
         "\n"
         "            // numerical input -> have move command:\n"
         "            // calculate new coordinates, checking for \"relative\"\n"
         "            N(a)\n"
         "                ? y = Y(i + (i<0 | *a=='+') * y)\n"
         "                    , x = X(j + (j<0 || strchr(a+1, '+')) * x)\n"
         "                :0\n"
         "\n"
         "            // check for empty input, read single newline\n"
         "            // and perform <return> command:\n"
         "            : ( *a = D(), scanf(\"%*c\") );\n"
         "\n"
         "        switch(*a)\n"
         "        {\n"
         "            A 'e':\n"
         "                y = r;\n"
         "                x = c[r-1] + 1;\n"
         "                B;\n"
         "\n"
         "            A 'b':\n"
         "                y = 1;\n"
         "                x = 1;\n"
         "                B;\n"
         "\n"
         "            A 'L':\n"
         "                for(o = y-4; ++o < y+2;)\n"
         "                    o<0 ^ o<r && P \"%c%s\\n\", o^z ? ' ' : '>', l[o]);\n"
         "                for(o = x+1; --o;)\n"
         "                    P \" \");\n"
         "                P \"^\\n\");\n"
         "                B;\n"
         "\n"
         "            A 'l':\n"
         "                puts(*L);\n"
         "                B;\n"
         "\n"
         "            A 'p':\n"
         "                i = 1;\n"
         "                j = 0;\n"
         "                N(a+2);\n"
         "                for(o = Y(i)-1; o<Y(j); ++o)\n"
         "                    puts(l[o]);\n"
         "                B;\n"
         "\n"
         "            A 'A':\n"
         "                y = r++;\n"
         "                strcpy(l[y], a+2);\n"
         "                x = c[y] = strlen(a+2);\n"
         "                ++x;\n"
         "                ++y;\n"
         "                B;\n"
         "\n"
         "            A 'i':\n"
         "                D();\n"
         "                --y;\n"
         "                x=X(0);\n"
         "                // Commands i and r are very similar -> fall through\n"
         "                // from i to r after moving rows down and setting\n"
         "                // position at end of line:\n"
         "\n"
         "            A 'r':\n"
         "                strcpy(*L+x-1, a+2);\n"
         "                *C = strlen(*L);\n"
         "                x = 1;\n"
         "                ++y > r && ++r;\n"
         "                B;\n"
         "\n"
         "            A 'I':\n"
         "                o = strlen(a+2);\n"
         "                memmove(*L+x+o-1, *L+x-1, *C-x+1);\n"
         "                *C += o;\n"
         "                memcpy(*L+x-1, a+2, o);\n"
         "                x += o;\n"
         "                B;\n"
         "\n"
         "            A 'd':\n"
         "                **L ? **L = *C = 0, x = 1 : U();\n"
         "                y = y>r ? r : y;\n"
         "                B;\n"
         "\n"
         "            A 'j':\n"
         "                y<r && U();\n"
         "        }\n"
         "    }\n"
         "}\n";);
}

This produces

main(){printf("Hello, World!");}

#define max(x,y)x>y?x:y
#define I(x)scanf("%d",&x)
a;b;main(){I(a);I(b);printf("\" max \": %d\n",max(a,b));}

x[10];*c;i;main(){int _e;for(;scanf("%d",&x)>0&&++_e;);for(c=x+_e;c-->x;i=100/ *x,printf("%d ",i- --_e));}

#define P printf(
#define A case
#define B break
#define L(i,e,t,f,s)for(o=i;o e;){strcpy(l[o t],l[o f]);c[o t]=c[s o];}
#define R(m,o){return b<1|b>m?m o:b;}
#define N(f)sscanf(f,"%d,%d",&i,&j)||sscanf(f,",%d",&j)
r,i,j,o,z,c[999],*C,x=1,y=1;char a[999],l[999][999],(*L)[999];D(){L(r,>y,,-1,--)r++?strcpy(l[o],l[o-1]+--x),c[o-1]=x,l[o-1][x]=0:0;c[y++]=strlen(l[o]);x=1;}U(){strcat(*L,l[y]);*C=strlen(*L);L(y+1,<r,-1,,++)--r;*l[r]=c[r]=0;}X(b)R(c[y-1],+1)Y(b)R(r,)main(){for(;;){z=i=y;L=l+--z;C=c+z;j=x;!r||y/r&&x>*C?P"end> "):P"%d,%d> ",y,x);scanf("%[^\n]%*c",a)?a[2]*=!!a[1],N(a)?y=Y(i+(i<0|*a=='+')*y),x=X(j+(j<0||strchr(a+1,'+'))*x):0:(*a=D(),scanf("%*c"));switch(*a){A'e':y=r;x=c[r-1]+1;B;A'b':y=1;x=1;B;A'L':for(o=y-4;++o<y+2;)o<0^o<r&&P"%c%s\n",o^z?' ' :'>',l[o]);for(o=x+1;--o;)P" ");P"^\n");B;A'l':puts(*L);B;A'p':i=1;j=0;N(a+2);for(o=Y(i)-1;o<Y(j);++o)puts(l[o]);B;A'A':y=r++;strcpy(l[y],a+2);x=c[y]=strlen(a+2);++x;++y;B;A'i':D();--y;x=X(0);A'r':strcpy(*L+x-1,a+2);*C=strlen(*L);x=1;++y>r&&++r;B;A'I':o=strlen(a+2);memmove(*L+x+o-1,*L+x-1,*C-x+1);*C+=o;memcpy(*L+x-1,a+2,o);x+=o;B;A'd':**L?**L=*C=0,x=1:U();y=y>r?r:y;B;A'j':y<r&&U();}}}

Limitation

This breaks definitions such as

#define A (x)

by removing the space that separates the name from expansion, giving

#define A(x)

with an entirely different meaning. This case is absent from the test sets, so I'm not going to address it.

I suspect I might be able to produce a shorter version with a multi-pass in-place conversion - I might try that next week.

Toby Speight

Posted 2017-11-09T13:47:52.110

Reputation: 5 058

You can save one byte by removing the = at the end of the definition of O and change the space that follows each call to O into a =. – Zacharý – 2017-11-10T16:40:00.777

This is great ;) About the "limitation", see also my comment on the question itself -- detecting this would add just too much complexity. – Felix Palmen – 2017-11-12T11:19:52.527

@Zachary - thanks for that - I forgot when I changed the general code to ASCII-specific that O'\\' and O' ' both acquired a space. – Toby Speight – 2017-11-13T09:43:51.923

464 bytes – ceilingcat – 2018-10-04T01:03:17.627

Perl 5, 250+3 (-00n), 167+1 (-p) bytes

$_.=<>while s/\\
//;s,(//.*)|(("|')(\\.|.)*?\3)|/?[^"'/]+,$1|$2?$2:$&=~s@(\S?)\K\s+(?=(.?))@"$1$2"=~/\w\w|\+\+|--|\/\*/&&$"@ger,ge;$d++&&$l+/^#/&&s/^/
/,$l=/^#/m if/./

Try it online

Nahuel Fouilleul

Posted 2017-11-09T13:47:52.110

Reputation: 5 582

Yes I just put a non-optimal solution. I've just added the tio link, I will look to golf it when i have time. – Nahuel Fouilleul – 2017-11-10T08:01:50.833

preprocessor directive are on their own line when are placed before code, as in the test cases however if it's required i will add change – Nahuel Fouilleul – 2017-11-10T09:47:07.603

1fixed see update – Nahuel Fouilleul – 2017-11-10T10:08:47.503

C, 705 663 640 bytes

Thanks to @Zacharý for golfing 40 bytes and thanks to @Nahuel Fouilleul for golfing 23 bytes!

#define A(x)(x>47&x<58|x>64&x<91|x>96&x<123)
#define K if(*C==47&(C[1]==47|p==47)){if(p==47)--G;for(c=1;c;*C++-92||c++)*C-10||--c;if(d)p=*G++=10,--d;
#define D if(!d&*C==35){d=1;if(p&p-10)p=*G++=10;}
#define S K}if((A(p)&A(*C))|(p==*C&l==43|p==45)|p==47&*C==42|p==95&(A(*C)|*C==95)|*C==95&(A(p)|p==95))p=*G++=32;}
#define W*C<33|*C==92
#define F{for(;W;C++)
c,d,e,p,l;g(char*C,char*G)F;for(;*C;*C>32&&*C-34&&*C-39&&(p=*G++=*C),*C-34&&*C-39&&C++){l=e=0;if(*C==34)l=34;if(*C==39)l=39;if(l)for(*G++=l,p=*G++=*++C;*C++-l|e%2;e=*(C-1)-92?0:e+1)p=*G++=*C;K}D if(d){if(W)F{*C-92||++d;*C-10||--d;if(!d){p=*G++=10;goto E;}}S}else{if(W)F;S}E:D}*G=0;}

Try it online!

Steadybox

Posted 2017-11-09T13:47:52.110

Reputation: 15 798

Can for(;W;C++){} become for(;W;C++);? – Zacharý – 2017-11-10T15:34:58.903

@Zacharý that was never asked for. It's a toool for the very last step: remove redundant whitespace and comments. – Felix Palmen – 2017-11-10T15:59:39.660

I was referring to his code, not the challenge. – Zacharý – 2017-11-10T16:01:53.647

@Zacharý haha I see ... weird when code and input are the same language ;) – Felix Palmen – 2017-11-10T16:07:58.160

Would this work for 665 bytes? https://goo.gl/E6tk8V

– Zacharý – 2017-11-10T16:34:22.897

Already been hopelessly outgolfed in C though – Zacharý – 2017-11-10T16:40:32.527

-23 bytes, && and || by & and | where possible,when no side effect tio

– Nahuel Fouilleul – 2017-11-10T16:53:45.623

Python 2, 479 456 445 434 502 497 bytes

e=enumerate
import re
u=re.sub
def f(s):
 r=()
 for l in u(r'\\\n','',s).split('\n'):
	s=p=w=0;L=[]
	for i,c in e(l):
	 if(p<1)*'//'==l[i:i+2]:l=l[:i]
	 if c in"'\""and w%2<1:
		if p in(c,0):L+=[l[s:i+1]];s=i+1
		p=[0,c][p<1]
	 w=[0,w+1]['\\'==c]
	r+=L+[l[s:]],
 S=''
 for l in r:s=''.join([u('. .',R,u('. .',R,u('\s+',' ',x))).strip(),x][i%2]for i,x in e(l));S+=['%s','\n%s\n'][s[:1]=='#']%s
 print u('\n\n','\n',S).strip()
def R(m):g=m.group(0);f=g[::2];return[f,g][f.isalnum()or f in'++ -- /*']

Try it online!

Edit: Fixed to include - -,+ +, and / *

TFeld

Posted 2017-11-09T13:47:52.110

Reputation: 19 246

Golf my "pre-golfed" C

Background

Task

I/O specification

Test cases

Answers

Pip, 148 135 133 138 bytes

Explanation of slightly ungolfed version

Backslash continuations

String and character literals

Comments

Preprocessor directives

Spaces that should not be eliminated

Whitespace elimination and cleanup

Back-substitution of string and character literals

Haskell, 327 360 418 394 bytes

C, 497 494 490 489 bytes

Explanation

Test program

Limitation

Perl 5, 250+3 (-00n), 167+1 (-p) bytes

C, 705 663 640 bytes

Python 2, 479 456 445 434 502 497 bytes