C code indenter

12

2

Situation: You are a high school teacher, teaching your computing class how to write C programs. However, since it is just the beginning of the term, you haven't taught them about the importance of indentation and spacing. As you are marking their work, your eyes hurt so much you scream in agony, and realise that this can't go on.

Task: You have decided to write a program, in any language, that takes a valid C sourcecode as input and output it nicely formatted. You should decide what is a nicely formatted code, as it is a popularity contest. You are encouraged to implement as many features as you can, the following are some examples:

  • Add proper indentation at the front of each line
  • Add spaces after , and other operators, e.g. converting int a[]={1,2,3}; to int a[] = {1, 2, 3};. Remember not to process operators within string literals though.
  • Remove trailing spaces after each line
  • Separating statements into several lines, e.g. the student may write tmp=a;a=b;b=tmp; or int f(int n){if(n==1||n==2)return 1;else return f(n-1)+f(n-2);} all in one line, you can separate them into different lines. Be aware of for loops though, they have semicolons in them but I really don't think you should split them up.
  • Add a new line after defining each function
  • Another other features you can come up with the help you comprehend your students' codes.

Winning criteria: This is a popularity contest, so the answer with most upvotes wins. In case of a tie, the answer with the most features implemented wins. If that is a tie again then shortest code wins.

You are suggested to include in your answer a list of features that you have implemented, as well as a sample input and output.

Edit: As requested in the comments here is a sample input, though keep in mind that it is only for reference and you are recommended to implement as many features as possible.

Input:

#include <stdio.h>
#include<string.h>
int main() {
int i;
char s[99];
     printf("----------------------\n;;What is your name?;;\n----------------------\n"); //Semicolon added in the string just to annoy you
             /* Now we take the input: */
    scanf("%s",s);
    for(i=0;i<strlen(s);i++){if(s[i]>='a'&&s[i]<='z'){
        s[i]-=('a'-'A'); //this is same as s[i]=s[i]-'a'+'A'
}}printf("Your name in upper case is:\n%s\n",s);
   return 0;}

This is how I would normally format this code: (I'm a lazy person)

#include <stdio.h>
#include <string.h>
int main() {
    int i;
    char s[99];
    printf("----------------------\n;;What is your name?;;\n----------------------\n"); //Semicolon added in the string just to annoy you
    /* Now we take the input: */
    scanf("%s",s);
    for(i=0;i<strlen(s);i++) {
        if(s[i]>='a'&&s[i]<='z') {
            s[i]-=('a'-'A'); //this is same as s[i]=s[i]-'a'+'A'
        }
    }
    printf("Your name in upper case is:\n%s\n",s);
    return 0;
}

This is how I think is easier to read:

#include <stdio.h>
#include <string.h>
int main() {
    int i;
    char s[99];
    printf("----------------------\n;;What is your name?;;\n----------------------\n"); //Semicolon added in the string just to annoy you
    /* Now we take the input: */
    scanf("%s", s);
    for(i = 0; i < strlen(s); i++) {
        if(s[i] >= 'a' && s[i] <= 'z') {
            s[i] -= ('a' - 'A'); //this is same as s[i]=s[i]-'a'+'A'
        }
    }
    printf("Your name in upper case is:\n%s\n", s);
    return 0;
}




Also, now that I start having answers, the answer with the highest vote count will be accepted 5 days after the last answer, i.e. if there are no more new answers within 5 days, this contest will end.

user12205

Posted 2014-02-07T18:51:57.420

Reputation: 8 752

2Any reasons for the down votes? – user12205 – 2014-02-07T20:46:43.963

1I could remove all unnecessary whitespace (s/\s+/ /) and call it a day – ratchet freak – 2014-02-10T10:06:48.873

1@ratchet freak you could, but I don't think this would be an answer with a lot of upvotes. – user12205 – 2014-02-10T16:52:58.580

I think you should provide an example of input. – None – 2014-02-13T08:49:44.967

@BenH You mean you haven't heard of C?! Well, here is a sample program: (I think the #include<stdio.h> needs to be on its own line for this to work) /* Hai Worldz. This code is to prevent formatting: if(this_code_is_touched){,then+your*program_"doesn't({work)}correctl"y.} */ #include<stdio.h> main(){printf("Hello World");} – Justin – 2014-02-13T09:13:00.193

No, I mean that if we have to write an indenter, at least provide in the main post a good example of C code containing every kind of cases we could see. Like a for loop, while, do while, if else, function, pointers maybe (but if students can't even indent, I guess they don't know what's a pointer), single and multi lines comments etc... I hope it's more clear. – None – 2014-02-13T09:33:53.193

Can you give some example C code? I do not program C but I would like to join the contest. – CousinCocaine – 2014-02-13T10:17:46.333

1@Quincunx thanks for setting up the bounty. I thought my question sucked. – user12205 – 2014-02-13T11:58:57.467

@BenH and CousinCocaine there you go. Also, just to clarify, your program should be able to implement the features in ALL valid C code, and you are recommended to implement as many features as you can. But keep in mind that this is a popularity contest, i.e. the answer with most votes wins, so essentially there is only one requirement: impress your fellow code-golfers. – user12205 – 2014-02-13T12:28:15.050

Seriously? I'd download gnu-indent if not on a unix box – Tom Tanner – 2014-02-13T13:23:08.513

@TomTanner, I posted your solution as an answer, but seriously I think this is not a 'problem' at all, tons of solutions out there that do not need a Bounty or a program to be written. We are reinventing the wheel... ;) – CousinCocaine – 2014-02-13T14:18:38.903

@ace No problem; I thought this question was great. I was sad to see it stay in the dark for so long (or were you hoping to get tumbleweed?). – Justin – 2014-02-13T18:30:56.037

1@CousinCocaine: this is a popularity context. They are not designed to solve the problem in an elegant, or standard way. They are designed to solve the problem in the coolest way you can possibly imagine. – SztupY – 2014-02-13T19:36:11.900

@SztupY, you are right and my sincere excuses for the demotivating words! – CousinCocaine – 2014-02-14T08:48:29.177

Answers

21

Because we are talking about indentation and whitespaces we just have to write the code in a programming a language that is actually designed around whitespace, as that has to be easiest, right?

So the solution is:




















































































































































































































































































































































































































































Here it is in base64:

ICAgCQogICAgCgkJICAgIAkgCiAgICAKCQkgICAgCQkKICAgIAoJCSAgICAJICAKICAgIAoJCSAgICAJIAkKICAgIAoJCSAKIAogCQoKICAgCSAKICAgCQoJCQkgICAJCQkgCQkKCSAgCQoJICAJICAKICAgCQoJCQkgICAJIAkgCgkgIAkKCSAgCSAJCiAgIAkKCQkJICAgCSAgCQoJICAJCgkgIAkgCQogICAJCgkJCSAgIAkgICAgIAoJICAJCgkgIAkgCQogICAJCgkJCSAgIAkJCQkJIAkKCSAgCQoJICAJCSAKICAgCQoJCQkgICAJICAgCSAKCSAgCQoJICAJICAgIAogICAJCgkJCSAgIAkgCSAgIAoJICAJCgkgIAkgICAJCQogICAJCgkJCSAgIAkgCSAgCQoJICAJCgkgIAkgIAkgIAogICAJIAogICAJCgkJIAogCSAJCQkKICAgCQoJCQkJCiAgCgkKCiAgIAkgIAogICAJCgkJCQkKICAKCQoKICAgCSAJCgoJCgogICAJCSAKICAgCQkKICAgCQkKCQkJICAgCQoJICAJCQkgICAgCSAKICAgCQoJCSAKIAkgCQkJCiAgIAkKCQkJCQogIAoJCgogICAJICAgIAogICAJIAogICAJIAoJCSAKIAkgCQkJCiAgIAkKCQkJCQogIAoJCgogICAJICAgCQkKICAgCSAJCiAgIAkgCQoJCQkgICAJCgkgICAJCSAgICAJIAogICAJCgkJIAogCSAJCQkKICAgCQoJCQkJCiAgCgkKCiAgIAkgIAkgIAogICAJIAkKICAgCSAJCgkJCSAgIAkKCSAgCQkJICAgIAkgCiAgIAkKCQkgCiAJIAkJCQogICAJCgkJCQkKICAKCQoKICAgCQkKICAgCQoJCQkgICAJIAkgCgkgIAkKCSAgCQkgIAogICAJCgkJCSAgIAkJCQkgCQkKCSAgCQoJICAJCSAJCiAgIAkKCQkJICAgCQkJCQkgCQoJICAJCgkgIAkJCSAKICAgCQoJCQkgICAJCQkgCQkKCSAgCQoJICAJCQkJCiAgIAkKCQkJICAgCSAgIAkgCgkgIAkKCSAgCQkgCQkKICAgCQoJCQkgICAJIAkgCSAKCSAgCQoJICAJCQkgIAogICAJCgkJCSAgIAkgCQkJCQoJICAJCgkgIAkJCSAJCiAgIAkKCQkJICAgCSAJICAgCgkgIAkKCSAgCSAgCSAJCiAgIAkKCQkJICAgCSAJICAJCgkgIAkKCSAgCSAgCQkgCiAgIAkKCQkJCQogIAoJCgogICAJCSAgCiAgIAkgCiAgICAKCQkgCgkKCiAgIAkJIAkKICAgCQoJCQkJCiAgICAgCSAKICAgIAoJCSAgICAJCQogICAJCQoJCQkgICAJCgkgICAJCSAKCQoKICAgCQkJIAogICAJCQogICAJCQoJCQkgICAJCgkgIAkJCSAKIAkgCQkJCiAgIAkKCQkJCQogICAgIAkgCiAgICAKCQkgCgkKCiAgIAkJCQkKICAgCQoJCQkJCiAgICAgCSAJCgkJCQoJICAJIAkgCgoJCgogICAJICAJCQkKICAgCSAKICAgIAoJCSAKCQoKICAgCQkgCQkKICAgCQoJCQkJCiAgICAgCSAKICAgCSAKCQkgCgkKCiAgIAkJCSAgCiAgIAkgIAoJCQkgICAJIAkJCQkKCSAgCQoJICAJCQkJIAogICAJCgkJCQkKICAKCQoKICAgCQkJIAkKICAgCSAgCgkJCSAgIAkgCQkJCQoJICAJCgkgIAkJCQkJCiAgIAkKCQkJCQogIAoJCgogICAJCQkJIAogICAJCgkJCQkKICAgICAJIAogICAJCQoJCSAKCQoKICAgCQkJCQkKICAgCQoJCQkJCiAgICAgCSAKICAgCSAgCgkJIAoJCgogICAJICAJIAkKICAgCSAJCiAgIAkgCQoJCQkgICAJCgkgICAJCSAgICAJCgkJCQkKICAKCQoKICAgCSAgCQkgCiAgIAkgCQogICAJIAkKCQkJICAgCQoJICAJCQkgICAgCQoJCQkJCiAgCgkKCiAgIAkJCQogICAJIAkgCgkKICAgICAJCQoJCQkKICAgCSAgCQogCiAKCSAgCSAgIAogCiAKCQkgCSAgIAogICAJICAgICAKCQogICAgIAkgICAgIAoJCiAgICAgCQoJICAJCiAKIAkgIAkKCiAgIAkgICAKCgkKCiAgIAkgCSAJCiAgIAkKCQkJICAgCSAJIAoJICAJCgkgIAkJICAgCiAgIAkKCQkJICAgCSAgIAkgCgkgIAkKCSAgCQkgIAkKICAgCQoJCQkJCiAgCgkKCiAgIAkJICAgCiAgIAkgCiAgICAKCQkgCgkKCiAgIAkJICAJCiAgIAkgIAoJCQkgICAJIAkJCSAgCgkgIAkKCSAgCQkgCSAKICAgCSAKICAgCQoJCSAgICAJCgkJCQkKICAKCQoKICAgCQkgCSAKICAgCQoJCQkJCiAgCgkKCiAgIAkgCQkgCiAgIAkKCQkJICAgCSAJCQkJCgkgIAkKCSAgCSAgICAgCiAgIAkKCQkJCQogIAoJCgogICAJICAgICAKICAgCSAgCgkJCSAgIAkgCSAJIAoJICAJCgkgIAkgICAgCQogICAJCgkJCQkKICAKCQoKICAgCSAgICAJCiAgIAkKCQkJCQogICAgIAkgCiAgIAkKCQkgCgkKCiAgIAkgCQkJCiAgIAkKCQkJICAgCSAJIAoJICAJCgkgIAkgICAJIAogICAJCgkJCQkKICAKCQoKICAgCSAgIAkgCiAgIAkgCiAgICAKCQkgCgkKCiAgIAkKICAgCSAgCiAgIAkKCQkJCQkgICAgCQoJCgkgICAgCSAKCQkJCgkgIAkgCSAKICAgCSAKCQkJICAgCQoJICAJCgkgIAkgICAJCiAgIAkgCgkJCSAgIAkgCgkgIAkKCSAgCSAgCSAKICAgCSAKCQkJICAgCQkKCSAgCQoJICAJICAJCQogICAJIAoJCQkgICAJICAKCSAgCQoJICAJIAkgIAoKICAgCSAgIAkKCiAJIAkJCgogCiAJIAkJCgogICAJIAkgCgogCSAJIAoKIAogCSAJCQoKICAgCSAgCSAKCiAJIAkgCSAJCgogCiAJIAkJCgogICAJICAJCQoKIAkgCSAJCSAKCiAKIAkgCQkKCiAgIAkgCSAgCgogCSAJIAkJCQoKICAgCSAJCQoKIAogCQoKCgo=

For those who have issues printing out the code on a paper here is the annotated version (you can find a compiler for this at the end of the answer):

# heap structure:
# 1: read buffer
# 2: parser state
#   0: before indentation
#   1: after indentation
#   2: inside a string literal
#   3: inside a multiline comment
#   4: inside a single line comment
# 3: identation
# 4: old read buffer
# 5: parenthesis nesting amount

# -------------------
# initialize heap
# -------------------
SS 1 | SS 0 | TTS # [1] := 0
SS 2 | SS 0 | TTS # [2] := 0
SS 3 | SS 0 | TTS # [3] := 0
SS 4 | SS 0 | TTS # [4] := 0
SS 5 | SS 0 | TTS # [5] := 0
LSL 1 # goto L1

# -------------------
# sub: determine what to do in state 0
# -------------------
LSS 2 # LABEL L2
SS 1 | TTT | SS  59 | TSST | LTS 4 # if [1] == ; GOTO L4
SS 1 | TTT | SS  10 | TSST | LTS 5 # if [1] == \n GOTO L5
SS 1 | TTT | SS   9 | TSST | LTS 5 # if [1] == \t GOTO L5
SS 1 | TTT | SS  32 | TSST | LTS 5 # if [1] == ' ' GOTO L5
SS 1 | TTT | SS 125 | TSST | LTS 6 # if [1] == } GOTO L6
SS 1 | TTT | SS  34 | TSST | LTS 16 # if [1] == " GOTO L16
SS 1 | TTT | SS  40 | TSST | LTS 35 # if [1] == ( GOTO L35
SS 1 | TTT | SS  41 | TSST | LTS 36 # if [1] == ) GOTO L36

SS 2 | SS 1 | TTS # [2] := 1
LST 7 # call L7
SS 1 | TTT | TLSS # print [1]
LTL # return

LSS 4 # label L4 - ; handler
SS 1 | TTT | TLSS # print [1]
LTL # return

LSS 5 # label L5 - WS handler
LTL # return

LSS 6 # label L6 - } handler
# decrease identation by one
SS 3 | SS 3 | TTT | SS 1 | TSST | TTS # [3] := [3] - 1
SS 2 | SS 1 | TTS # [2] := 1
LST 7 # call L7
SS 1 | TTT | TLSS # print [1]
LTL # return

LSS 16 # label L16 - " handler
SS2 | SS 2 | TTS # [2] := 2
LST 7 # call L7
SS1 | TTT | TLSS # print [1]
LTL

LSS 35
SS 5 | SS 5 | TTT | SS 1 | TSSS | TTS # [5] := [5] + 1
SS 2 | SS 1 | TTS # [2] := 1
LST 7 # call L7
SS1 | TTT | TLSS # print [1]
LTL

LSS 36
SS 5 | SS 5 | TTT | SS 1 | TSST | TTS # [5] := [5] - 1
SS 2 | SS 1 | TTS # [2] := 1
LST 7 # call L7
SS1 | TTT | TLSS # print [1]
LTL

# -------------------
# sub: determine what to do in state 1
# -------------------
LSS 3 # LABEL L3
SS 1 | TTT | SS  10 | TSST | LTS 12 # if [1] == \n GOTO L12
SS 1 | TTT | SS 123 | TSST | LTS 13 # if [1] == { GOTO L13
SS 1 | TTT | SS 125 | TSST | LTS 14 # if [1] == } GOTO L14
SS 1 | TTT | SS  59 | TSST | LTS 15 # if [1] == ; GOTO L15
SS 1 | TTT | SS  34 | TSST | LTS 27 # if [1] == " GOTO L27
SS 1 | TTT | SS  42 | TSST | LTS 28 # if [1] == * GOTO L28
SS 1 | TTT | SS  47 | TSST | LTS 29 # if [1] == / GOTO L29
SS 1 | TTT | SS  40 | TSST | LTS 37 # if [1] == ( GOTO L37
SS 1 | TTT | SS  41 | TSST | LTS 38 # if [1] == ) GOTO L38
SS 1 | TTT | TLSS # print [1]
LTL # return

LSS 12 # LABEL L12 - \n handler
SS 2 | SS 0 | TTS # [2] := 0
LTL # return

LSS 13 # LABEL L13 - { handler
SS 1 | TTT | TLSS # print [1]
SS 2 | SS 0 | TTS # [2] := 0
SS 3 | SS 3 | TTT | SS 1 | TSSS | TTS # [3] := [3] + 1
LTL # return

LSS 14 # LABEL L14 - } handler
SS 3 | SS 3 | TTT | SS 1 | TSST | TTS # [3] := [3] - 1
LST 7 # call L7
SS 1 | TTT | TLSS # print [1]
SS 2 | SS 0 | TTS # [2] := 0
LTL # return

LSS 15 # LABEL L15 - ; handler
SS 1 | TTT | TLSS # print [1]
SS 5 | TTT | LTS 10 # if [5] == 0 GOTO L39
LTL

LSS 39
SS 2 | SS 0 | TTS # [2] := 0
LTL # return

LSS 27 # label L27 - " handler
SS1 | TTT | TLSS # print [1]
SS2 | SS 2 | TTS # [2] := 2
LTL

LSS 28 # label L28 - * handler - this might start a comment
SS 4 | TTT | SS  47 | TSST | LTS 30 # if [4] == / GOTO L30
SS1 | TTT | TLSS # print [1]
LTL

LSS 29 # label L29 - / handler - this might start a comment
SS 4 | TTT | SS  47 | TSST | LTS 31 # if [4] == / GOTO L31
SS1 | TTT | TLSS # print [1]
LTL

LSS 30 # label L30 - /* handler
SS1 | TTT | TLSS # print [1]
SS2 | SS 3 | TTS # [2] := 3
LTL

LSS 31 # label L31 - // handler
SS1 | TTT | TLSS # print [1]
SS2 | SS 4 | TTS # [2] := 4
LTL

LSS 37
SS 5 | SS 5 | TTT | SS 1 | TSSS | TTS # [5] := [5] + 1
SS1 | TTT | TLSS # print [1]
LTL

LSS 38
SS 5 | SS 5 | TTT | SS 1 | TSST | TTS # [5] := [5] - 1
SS1 | TTT | TLSS # print [1]
LTL

# -------------------
# sub: print identation
# -------------------
LSS 7 # label L7 - print identation
SS 10 | TLSS # print \n
SS 3 | TTT # push [3]
LSS 9 # label L9 - start loop
SLS | LTS 8 # if [3] == 0 GOTO L8
SLS | LTT 8 # if [3] < 0 GOTO L8 - for safety
SS 32 | TLSS # print ' '
SS 32 | TLSS # print ' '
SS 1  | TSST # i := i - 1
LSL 9 # GOTO L9
LSS 8 # label L8 - end loop
LTL #

# -------------------
# sub: L21 - string literal handler
# -------------------
LSS 21
SS 1 | TTT | SS  10 | TSST | LTS 24 # if [1] == \n GOTO L24
SS 1 | TTT | SS  34 | TSST | LTS 25 # if [1] == " GOTO L25
SS 1 | TTT | TLSS # print [1]
LTL

LSS 24 # \n handler - this should never happen, but let's be prepared and reset the parser
SS 2 | SS 0 | TTS # [2] := 0
LTL # return

LSS 25 # " handler - this might be escaped, so be prepared
SS 4 | TTT | SS  92 | TSST | LTS 26 # if [4] == \ GOTO L26
SS 2 | SS 1 | TTS # [2] := 1
SS 1 | TTT | TLSS # print [1]
LTL

LSS 26 # \\" handler - escaped quotes don't finish the literal
SS 1 | TTT | TLSS # print [1]
LTL

# -------------------
# sub: L22 - multiline comment handler
# -------------------
LSS 22
SS 1 | TTT | SS  47 | TSST | LTS 32 # if [1] == / GOTO L32
SS 1 | TTT | TLSS # print [1]
LTL

LSS 32
SS 4 | TTT | SS  42 | TSST | LTS 33 # if [4] == * GOTO L33
SS 1 | TTT | TLSS # print [1]
LTL

LSS 33
SS 1 | TTT | TLSS # print [1]
SS 2 | SS 1 | TTS # [2] := 1
LTL
# -------------------
# sub: L23 - singleline comment handler
# -------------------
LSS 23
SS 1 | TTT | SS  10 | TSST | LTS 34 # if [1] == \n GOTO L34
SS 1 | TTT | TLSS # print [1]
LTL

LSS 34
SS 2 | SS 0 | TTS # [2] := 0
LTL

# -------------------
# main loop
# -------------------
LSS 1 # LABEL L1
SS 4 | SS 1 | TTT | TTS # [4] := [1]
SS 1 | TLTS # [1] := read

SS 2 | TTT | LTS 10 # if [2] == 0 GOTO L10
SS 2 | TTT | SS 1 | TSST | LTS 17 # if [2] == 1 GOTO L17
SS 2 | TTT | SS 2 | TSST | LTS 18 # if [2] == 2 GOTO L18
SS 2 | TTT | SS 3 | TSST | LTS 19 # if [2] == 3 GOTO L19
SS 2 | TTT | SS 4 | TSST | LTS 20 # if [2] == 4 GOTO L20

LSS 17
LST 3  # call L3
LSL 11 # GOTO L11

LSS 10 # label L10
LST 2  # call L2
LSL 11

LSS 18
LST 21
LSL 11

LSS 19
LST 22
LSL 11

LSS 20
LST 23

LSS 11 # label L11
LSL 1  # goto L1
LLL # END

This is still work in progress, although hopefully it should pass most of the criterias!

Currently supported features:

  • fix identation based on the { and } characters.
  • add a newline after ;
  • handle indentation characters inside string literals (including the fact that string literals are not closed when encountering a \")
  • handle indentation characters inside single and multiline comments
  • doesn't add newline characters if inside parentheses (like a for block)

Example input (I added some edge cases based Quincunx's comment, so you can check that it behaves properly):

    /* Hai Worldz. This code is to prevent formatting: if(this_code_is_touched){,then+your*program_"doesn't({work)}correctl"y.} */
#include<stdio.h>
#include<string.h>
int main() {
int i;
char s[99];
     printf("----------------------\n;;What is your name?;;\n----------------------\n\""); //Semicolon added in the {;} string just to annoy you
             /* Now we take the {;} input: */
    scanf("%s",s);
    for(i=0;i<strlen(s);i++){if(s[i]>='a'&&s[i]<='z'){
        s[i]-=('a'-'A'); //this is same as s[i]=s[i]-'a'+'A'
}}printf("Your \"name\" in upper case is:\n%s\n",s);
   return 0;}

Example output:

[~/projects/indent]$ cat example.c | ./wspace indent.ws 2>/dev/null

/* Hai Worldz. This code is to prevent formatting: if(this_code_is_touched){,then+your*program_"doesn't({work)}correctl"y.} */
#include<stdio.h>
#include<string.h>
int main() {
  int i;;
  char s[99];;
  printf("----------------------\n;;What is your name?;;\n----------------------\n\"");; //Semicolon added in the {;} string just to annoy you
  /* Now we take the {;} input: */
  scanf("%s",s);;
  for(i=0;i<strlen(s);i++){
    if(s[i]>='a'&&s[i]<='z'){
      s[i]-=('a'-'A');; //this is same as s[i]=s[i]-'a'+'A'
    }
  }
  printf("Your \"name\" in upper case is:\n%s\n",s);;
  return 0;;
}

Note, that because whitespace doesn't support EOF checking the intepreter throws an exception, which we need to suppress. As there is no way in whitespace to check for EOF (as far as I know as this is my first whitespace program) this is something unavoidable, I hope the solution still counts.

This is the script I used to compile the annotated version into proper whitespace:

#!/usr/bin/env ruby
ARGF.each_line do |line|
  data = line.gsub(/'.'/) { |match| match[1].ord }
  data = data.gsub(/[^-LST0-9#]/,'').split('#').first
  if data
    data.tr!('LST',"\n \t")
    data.gsub!(/[-0-9]+/){|m| "#{m.to_i<0?"\t":" "}#{m.to_i.abs.to_s(2).tr('01'," \t")}\n" }
    print data
  end
end

To run:

./wscompiler.rb annotated.ws > indent.ws

Note that this, apart from converting the S, L and T characters, also allow single line comments with #, and can automatically convert numbers and simple character literals into their whitespace representation. Feel free to use it for other whitespace projects if you want

SztupY

Posted 2014-02-07T18:51:57.420

Reputation: 3 639

Not bad! :) but it would be nicer if it doesn't split up a for loop into three lines (in my opinion at least... it's still up to the voters to decide). As a reminder, the for loop in C has the syntax for(i=0;i<10;i++) – user12205 – 2014-02-13T11:56:31.333

2@ace: as Whitespace is not exactly a high-level language adding these kind of exception is not that easy. That said I still try to fix the two issues with comments and literals, and try to manage to fix the loops as well (I think ignoring ; inside (/) blocks would be enough). I think those should be enough to consider the solution "usable". – SztupY – 2014-02-13T11:58:46.157

1@ace: I think I managed to add the exceptions, so now the generated code seems to be indented properly. Changed the example to your one – SztupY – 2014-02-13T19:14:18.120

9

Vim the easy way, technically using only one character: =

I am not a vim guru, but I do never underestimate it's power and some consider it as a programming language. For me this solution is a winner anyway.

Open the file in vim:

vim file.c

Within vim press the following keys

gg=G

Explanation:

gg goes to the top of the file

= is a command to fix the indentation

G tells it to perform the operation to the end of the file.

You can save and quit with :wq

It is possible to let vim run command from the command-line, so this can also be done in a one-liner, but I leave that to people who know vim better than I do.


Vim example a valid input file (fibonacci.c) with bad indent.

/* Fibonacci Series c language */
#include<stdio.h>

int main()
{
int n, first = 0, second = 1, next, c;

            printf("Enter the number of terms\n");
scanf("%d",&n);
  printf("First %d terms of Fibonacci series are :-\n",n);

          for ( c = 0 ; c < n ; c++ )
   {
if ( c <= 1 )
         next = c;
 else
                                     {
next = first +    second;
              first = second;
        second = next;
      }
      printf("%d\n",next);
   }
 return 0;
}

Open in vim: vim fibonacci.c press gg=G

/* Fibonacci Series c language */
#include<stdio.h>

int main()
{
  int n, first = 0, second = 1, next, c;

  printf("Enter the number of terms\n");
  scanf("%d",&n);
  printf("First %d terms of Fibonacci series are :-\n",n);

  for ( c = 0 ; c < n ; c++ )
  {
    if ( c <= 1 )
      next = c;
    else
    {
      next = first +    second;
      first = second;
      second = next;
    }
    printf("%d\n",next);
  }
  return 0;
}

CousinCocaine

Posted 2014-02-07T18:51:57.420

Reputation: 1 572

Source: http://stackoverflow.com/a/2355848/1919382

– CousinCocaine – 2014-02-13T10:33:56.507

This can be shortened to =GZZ. (Vim golf ftw!) – Doorknob – 2014-12-06T17:13:24.047

7

Since this will be used to help the teacher understand the student's code better it is important to sanitize the input first. Pre-processor directives are unneeeded, as they just intruduce clutter, and macros can also introduce malicious code into the file. We don't want that! Also, it is completely unnecessary to retain the original comments the student wrote, as they are probably completely useless anyway.

Instead, as everyone knows good code needs good comments, apart from fixing the indentation, and the structure, why not add some highly useful comments around main points of the code to make the result even more understandable! This will definitely help the teacher in assessing the work the student has done!

So from this:

    /* Hai Worldz. This code is to prevent formatting: if(this_code_is_touched){,then+your*program_"doesn't({work)}correctl"y.} */
#include<stdio.h>
#include<string.h>
int main() {
int i;
char s[99];
     printf("----------------------\n;;What is your name?;;\n----------------------\n\""); //Semicolon added in the {;} string just to annoy you
             /* Now we take the {;} input: */
    scanf("%s",s);
    for(i=0;i<strlen(s);i++){if(s[i]>='a'&&s[i]<='z'){
        s[i]-=('a'-'A'); //this is same as s[i]=s[i]-'a'+'A'
}}printf("Your \"name\" in upper case is:\n%s\n",s);
   return 0;}

Let's produce this:

int main() {
    /* This will declare i. */
    int i;
    /* This will declare s[99]. */
    char s[99];
    /* This will call the function called printf with 1 parameters */
    printf("----------------------\n;;What is your name?;;\n----------------------\n\"");
    /* This will call the function called scanf with 2 parameters */
    scanf("%s", s);
    /* This will start a for loop, with initializator i = 0. It will loop until i < strlen(s), and will i++ at each iteration */
    for (i = 0; i < strlen(s); i++) {
        /* This will check, whether s[i] >= 'a' && s[i] <= 'z' is true or not. */
        if (s[i] >= 'a' && s[i] <= 'z') {
            s[i] -= 'a' - 'A';
        }
    }
    /* This will call the function called printf with 2 parameters */
    printf("Your \"name\" in upper case is:\n%s\n", s);
    /* This will return from the function. */
    return 0;
}

Isn't it much better, with all of the useful comments around the expressions?


So this is a ruby solution utilizing the cast gem, which is a C parser(Yes, I'm cheating). As this will parse the code, and re-print it from scratch it means the result will be perfectly indented and also consistent, e.g.:

  • Proper indentation across the code based on block level
  • Consistent whitespacing around expressions, statements, conditions, etc.
  • Complete re-indenting based on the structure of the code
  • etc.

And also it will contain highly useful comments about how the code works, which will super useful for both the students and the teacher!

indent.rb

#!/usr/bin/env ruby
require 'cast'

code = ''
ARGF.each_line do |line|
  if line=~/\A#/
    code << "// #{line.strip}\n"
  else
    code << line
  end
end

class Comment < C::Literal
  field :val
  initializer :val
  def to_s
    "/* #{val} */"
  end
end

tree = C.parse(code)
tree.preorder do |n|
  break if n.kind_of?(Comment)
  if n.kind_of?(C::Declaration)
    dd = []
    n.declarators.each do |d|
      dd << "declare #{d.indirect_type ? d.indirect_type.to_s(d.name) : d.name}"
      dd.last << " and set it to #{d.init}" if d.init
    end
    unless dd.empty?
      n.insert_prev(Comment.new("This will #{dd.join(", ")}."))
    end
  end rescue nil
  n.parent.insert_prev(Comment.new("This will call the function called #{n.expr} with #{n.args.length} parameters")) if n.kind_of?(C::Call) rescue nil
  n.insert_prev(Comment.new("This will start a for loop, with initializator #{n.init}. It will loop until #{n.cond}, and will #{n.iter} at each iteration")) if n.kind_of?(C::For) rescue nil
  n.insert_prev(Comment.new("This will check, whether #{n.cond} is true or not.")) if n.kind_of?(C::If) rescue nil
  n.insert_prev(Comment.new("This will return from the function.")) if n.kind_of?(C::Return) rescue nil
end

puts tree

Gemfile

source "http://rubygems.org"
gem 'cast', '0.2.1'

SztupY

Posted 2014-02-07T18:51:57.420

Reputation: 3 639

3+1 for it is completely unnecessary to retain the original comments the student wrote, as they are probably completely useless anyway – SeinopSys – 2014-02-19T19:22:56.950

Those comments are undescriptive to me. They introduce clutter because they simply restate the following line. – Justin – 2014-02-20T06:54:32.667

@Quincunx I think you missed the sarcasm tag. This is a popularity contest. – SztupY – 2014-02-20T09:01:12.427

5

Bash, 35 characters

Input file must be named "input.c" and placed in the current working directory.

sh <(wget -q -O- http://x.co/3snpk)

Example output, having been fed the input in the original question: http://i.imgur.com/JEI8wa9.png

It may take a few seconds to run depending on your hardware, so be patient :)

Riot

Posted 2014-02-07T18:51:57.420

Reputation: 4 639

You know that this is not code-golf, right? – Justin – 2014-02-14T08:12:39.277

1+1 for downloading and compiling AStyle on the fly O.o Wouldn't you just leave AStyle there for the convenience of the user though? So remove the rm? – tomsmeding – 2014-02-14T14:41:09.330

Since this is not code-golf I think it would be better to just put the contents of your pastebin here, as from a quick look I didn't even notice you are actually formatting the code properly with an external command – SztupY – 2014-02-14T15:39:26.373

@Quincunx: it's meant to be a bit of a troll answer :P – Riot – 2014-02-14T16:14:28.223

@tomsmeding: I wouldn't want my code to have side effects, such as taking up the user's disk space unexpectedly... – Riot – 2014-02-14T16:16:13.213

@Riot then you would also need to rm input.c.orig... – tomsmeding – 2014-02-14T19:57:41.917

Backing up your original file is not unusual behaviour though, it's not the same as leaving an entire cloned svn repository hanging about :) – Riot – 2014-02-14T23:46:01.037

3

Ruby

code = DATA.read

# first, we need to replace strings and comments with tilde escapes to avoid parsing them
code.gsub! '~', '~T'
lineComments = []
code.gsub!(/\/\/.*$/) { lineComments.push $&; '~L' }
multilineComments = []
code.gsub!(/\/\*.*?\*\//m) { multilineComments.push $&; '~M' }
strs = []
code.gsub!(/"(\\.|[^"])*"|'.'/) { strs.push $&; '~S' } # character literals are considered strings

# also, chop out preprocessor stuffs
preprocessor = ''
code.gsub!(/(^#.*\n)+/) { preprocessor = $&; '' }

# clean up newlines and excess whitespace
code.gsub! "\n", ' '
code.gsub! /\s+/, ' '
code.gsub!(/[;{}]/) { "#{$&}\n" }
code.gsub!(/[}]/) { "\n#{$&}" }
code.gsub! /^\s*/, ''
code.gsub! /\s+$/, ''

# fix for loops (with semicolons)
code.gsub!(/^for.*\n.*\n.*/) { $&.gsub ";\n", '; ' }

# now it's time for indenting; add indent according to {}
indentedCode = ''
code.each_line { |l|
    indentedCode += ('    ' * [indentedCode.count('{') - indentedCode.count('}') - (l =~ /^\}/ ? 1 : 0), 0].max) + l
}
code = indentedCode

# finally we're adding whitespace for more readability. first get a list of all operators
opsWithEq = '= + - * / % ! > < & | ^ >> <<'
opsNoEq = '++ -- && ||'
ops = opsWithEq.split + opsWithEq.split.map{|o| o + '=' } + opsNoEq.split
ops = ops.sort_by(&:length).reverse
# now whitespace-ize them
code.gsub!(/(.)(#{ops.map{|o| Regexp.escape o }.join '|'})(.)/m) { "#{$1 == ' ' ? ' ' : ($1 + ' ')}#{$2}#{$3 == ' ' ? ' ' : (' ' + $3)}" }

# special-cases: only add whitespace to the right
ops = ','.split
code.gsub!(/(#{ops.map{|o| Regexp.escape o }.join '|'})(.)/m) { "#{$1}#{$2 == ' ' ? ' ' : (' ' + $2)}" }
# special-cases: only add whitespace to the left
ops = '{'.split
code.gsub!(/(.)(#{ops.map{|o| Regexp.escape o }.join '|'})/m) { "#{$1 == ' ' ? ' ' : ($1 + ' ')}#{$2}" }

# replace the tilde escapes and preprocessor stuffs
stri = lci = mci = -1
code.gsub!(/~(.)/) {
    case $1
    when 'T'
        '~'
    when 'S'
        strs[stri += 1]
    when 'L'
        lineComments[lci += 1] + "\n#{code[0, $~.begin(0)].split("\n").last}"
    when 'M'
        multilineComments[mci += 1]
    end
}
code = (preprocessor + "\n" + code).gsub /^ +\n/, ''

puts code
__END__
    /* Hai Worldz. This code is to prevent formatting: if(this_code_is_touched){,then+your*program_"doesn't({work)}correctl"y.} */
#include<stdio.h>
#include<string.h>
int main() {
int i;
char s[99];
     printf("----------------------\n;;What is your name?;;\n----------------------\n\""); //Semicolon added in the {;} string just to annoy you
             /* Now we take the {;} input: */
    scanf("%s",s);
    for(i=0;i<strlen(s);i++){if(s[i]>='a'&&s[i]<='z'){
        s[i]-=('a'-'A'); //this is same as s[i]=s[i]-'a'+'A'
}}printf("Your \"name\" in upper case is:\n%s\n",s);
   return 0;}

Output:

#include <stdio.h>
#include<string.h>

int main() {
    int i;
    char s[99];
    printf("----------------------\n;;What is your name?;;\n----------------------\n");
    //Semicolon added in the string just to annoy you
     /* Now we take the input: */ scanf("%s", s);
    for(i = 0; i < strlen(s); i ++ ) {
        if(s[i] >= 'a' && s[i] <= 'z') {
            s[i] -= ('a' - 'A');
            //this is same as s[i]=s[i]-'a'+'A'
        }
    }
    printf("Your name in upper case is:\n%s\n", s);
    return 0;
}

Output for @SztupY's edge case input:

#include<stdio.h>
#include<string.h>

/* Hai Worldz. This code is to prevent formatting: if(this_code_is_touched){,then+your*program_"doesn't({work)}correctl"y.} */ int main() {
    int i;
    char s[99];
    printf("----------------------\n;;What is your name?;;\n----------------------\n\"");
    //Semicolon added in the {;} string just to annoy you
     /* Now we take the {;} input: */ scanf("%s", s);
    for(i = 0; i < strlen(s); i ++ ) {
        if(s[i] >= 'a' && s[i] <= 'z') {
            s[i] -= ('a' - 'A');
            //this is same as s[i]=s[i]-'a'+'A'
        }
    }
    printf("Your \"name\" in upper case is:\n%s\n", s);
    return 0;
}

Features so far:

  • [x] Add proper indentation at the front of each line
  • [x] Add spaces after , and other operators, e.g. converting int a[]={1,2,3}; to int a[] = {1, 2, 3};. Remember not to process operators within string literals though.
  • [x] Remove trailing spaces after each line
  • [x] Separating statements into several lines, e.g. the student may write tmp=a;a=b;b=tmp; or int f(int n){if(n==1||n==2)return 1;else return f(n-1)+f(n-2);} all in one line, you can separate them into different lines. Be aware of for loops though, they have semicolons in them but I really don't think you should split them up.
  • [ ] Add a new line after defining each function
  • [ ] Another other features you can come up with the help you comprehend your students' codes.

Doorknob

Posted 2014-02-07T18:51:57.420

Reputation: 68 138

3

This is written in python and based on the GNU Coding Standards.

Features so far:

  • Indenting blocks
  • Splitting lines (may split things that shouldn't be)
  • GNU-style function definitions

Code:

import sys

file_in = sys.argv[1]

# Functions, for, if, while, and switch statements
def func_def(string):
    ret = ["", "", ""]
    func_name = ""
    paren_level = -1
    consume_id = False

    for x in string[::-1]:
        if x == "{":
            ret[2] = "{"
        elif ret[1] == "":
            if x == "(":
                paren_level -= 1
                func_name += x
            elif x == ")":
                paren_level += 1
                func_name += x
            elif paren_level == -1 and x.isspace():
                if consume_id:
                    ret[1] = func_name[::-1]
            elif paren_level == -1:
                consume_id = True
                func_name += x
            else:
                func_name += x
        else:
            # Return Type
            ret[0] += x
    else:
        ret[1] = func_name[::-1]

    ret[0] = ret[0][::-1]

    # Handle the case in which this is just a statement
    if ret[1].split("(")[0].strip() in ["for", "if", "while", "switch"]:
        ret = [ret[1], ret[2]] # Don't print an extra line

    return ret

with open(file_in) as file_obj_in:
    line_len = 0
    buffer = ""
    in_str = False
    no_newline = False
    indent_level = 0
    tab = " " * 4
    no_tab = False
    brace_stack = [-1]

    while True:
        buffer += file_obj_in.read(1)
        if buffer == "":
            break
        elif "\n" in buffer:
            if not no_newline:
                print(("" if no_tab else indent_level * tab) + buffer, end="")
                buffer = ""
                line_len = indent_level * 4
                no_tab = False
                continue
            else:
                buffer = ""
                line_len = indent_level * 4
                no_newline = False
                continue
        elif buffer[-1] == '"':
            in_str = not in_str
        elif buffer.isspace():
            buffer = ""
            continue

        if "){" in "".join(buffer.split()) and not in_str:
            for x in func_def(buffer):
                print(indent_level * tab + x)
            buffer = ""
            line_len = indent_level * 4
            no_newline = True
            indent_level += 1
            brace_stack[0] += 1
            brace_stack.append((brace_stack[0], True))
            continue
        elif buffer[-1] == "}" and not in_str:
            brace_stack[0] -= 1
            if brace_stack[-1][1]: # If we must print newline and indent
                if not buffer == "}":
                    print(indent_level * tab + buffer[:-1].rstrip("\n"))
                indent_level -= 1
                print(indent_level * tab + "}")
                buffer = ""
                line_len = indent_level * 4
            else:
                pass
            brace_stack.pop()
        line_len += 1

        if line_len == 79 and not in_str:
            print(indent_level * tab + buffer)
            buffer = ""
            line_len = indent_level * 4
            continue
        elif line_len == 78 and in_str:
            print(indent_level * tab + buffer + "\\")
            buffer = ""
            line_len = indent_level * 4
            no_tab = True
            continue

Example input (pass a filename as argument):

#include <stdio.h>
#include<string.h>
int main() {
int i;
char s[99];
     printf("----------------------\n;;What is your name?;;\n----------------------\n"); //Semicolon added in the string just to annoy you
             /* Now we take the input: */
    scanf("%s",s);
    for(i=0;i<strlen(s);i++){if(s[i]>='a'&&s[i]<='z'){
        s[i]-=('a'-'A'); //this is same as s[i]=s[i]-'a'+'A'
}}printf("Your name in upper case is:\n%s\n",s);
   return 0;}

Example output:

#include <stdio.h>
#include<string.h>
int
main()
{
    int i;
    char s[99];
    printf("----------------------\n;;What is your name?;;\n------------------\
----\n"); //Semicolon added in the string just to annoy you
    /* Now we take the input: */
    scanf("%s",s);
    for(i=0;i<strlen(s);i++)
    {
        if(s[i]>='a'&&s[i]<='z')
        {
            s[i]-=('a'-'A'); //this is same as s[i]=s[i]-'a'+'A'
        }
    }
    printf("Your name in upper case is:\n%s\n",s);
    return 0;
}

This will have bugs.

golfer9338

Posted 2014-02-07T18:51:57.420

Reputation: 411

0

.Net

Open that file using Visual Studio

Input:

#include<stdio.h>
#include<string.h>
int main() {
int i;
char s[99];
     printf("----------------------\n;;What is your name?;;\n----------------------\n\""); //Semicolon added in the {;} string just to annoy you
             /* Now we take the {;} input: */
    scanf("%s",s);
    for(i=0;i<strlen(s);i++){if(s[i]>='a'&&s[i]<='z'){
            s[i]-=('a'-'A'); //this is same as s[i]=s[i]-'a'+'A'
    }
}printf("Your \"name\" in upper case is:\n%s\n",s);
       return 0;}

Output:

enter image description here

Venkatesh K

Posted 2014-02-07T18:51:57.420

Reputation: 219

2This is a good idea, but the challenge is a programming contest. Can you write a program that opens this file with visual studios, then saves it in pretty print? – Justin – 2014-02-19T18:49:37.113