Fully Modular C: Grading

8

1

You are a Computer Science professor teaching the C programming language. One principle you seek to impart to the students is modularity. Unfortunately, past classes have tended not to get the message, submitting assignments with the entire program inside main(). Therefore, for this semester you have issued strict modularity guidelines upon which students will be graded.

A subset of C grammar and rules for "well-formed" translation unit are defined below. Code that follows these rules should be valid C89, UNLESS an identifier that is a keyword is used.

Task

You will receive as input a string purportedly containing C code. You may assume that this string contains only spaces, newlines, and the characters abcdefghijklmnopqrstuvwxyz123456789(){},+-*/%;=.

Your code must output the number of points the student's assignment should receive according to the following rubric:

  • Input is not a valid translation-unit according to the grammar: 0 points
  • Input follows the grammar but is not "well-formed" according to the rules below: 1 point
  • Input is a well-formed translation unit but not fully modular: 2 points
  • Input is a fully modular well-formed translation unit: 3 points

Token definitions

  • identifier: Any sequence of 1 or more lowercase English letters. If an identifier is a C89 reserved word1, you may optionally return 0 instead of whatever the result would have been ignoring reserved words. You do not have to be consistent about detecting the use of reserved words as identifiers; you may flag them in some instances and let them pass in others.

  • integer-literal: A sequence of 1 or more of the digits 1-9 (recall that the character 0 is guaranteed not to appear in the input)

  • Other valid tokens are defined literally in the grammar.
  • A character must belong to a token if and only if it is not whitespace.
  • Two consecutive alphanumeric characters must be part of the same token.

EBNF grammar

var-expr = identifier
literal-expr = integer-literal
binary-op = "+" | "-" | "*" | "/" | "%"
binary-expr = expr binary-op expr
paren-expr = "(" expr ")"
call-expr = identifier "(" [ expr ( "," expr )* ] ")"
expr = var-expr | literal-expr | binary-expr | paren-expr | call-expr
assign-stmt = var-expr "=" expr ";"
if-stmt = "if" "(" expr ")" assign-stmt
return-stmt = "return" expr ";"
function-body = ( assign-stmt | if-stmt )* return-stmt
argument-list = [ identifier ( "," identifier )* ]
function-definition = identifier "(" argument-list ")" "{" function-body "}"
translation-unit = function-definition*

Well-formed program requirements

  • No two function definitions may have the same function name.
  • No two identifiers in an argument-list may be identical.
  • No identifier in an argument-list may be identical to a function name (whether from a function-definition or a call-expr).
  • The identifier in a var-expr must be included in the enclosing function's argument-list.
  • For a given function, all call-exprs and the function-definition (if any) must agree in number of arguments.

Fully modular

  • No more than 1 binary operator per function
  • No more than 1 assignment statement per function
  • No more than 1 function call per function

Examples (one per line)

Score 0

}}}}}
return 2;
f() { return -1; }
f() {}
f(x,) { return 1; }
f(x) { return 1 }
f(x) { returnx; }
f(x) { return1; }
f() { g(); return 1;}
f() { if(1) return 5; }
f(x) { if(1) if(1) x = 2; return x; }
f(x, y) { x = y = 2; return x; }

Score 1

f(){ return 1; } f(){ return 1; }
g(x, x) { return 1; }
g(f) { return 1; } f() { return 1; }
f(x) { x = write(); x = write(1); return 1; }
f() { return f(f); }
f() { return 1; } g() { return f(234567); }
f() { return(x); }
f() { j = 7; return 5; }

Score 2

f(x,y,zzzzz) { return x + y + zzzzz; }
f(x,a,b) { if(a) x = foo(); if(b) x = bar(); return x; }
f(j) { return g(h( i() / j, i() ), 1) ; }

Score 3

mod(x, y) { return ((x % y)); }
f() { return f(); }
f(c) { if(c) c = g(c) + 2; return c; }
fib(i){return bb(i,0,1);}aa(i,a,b){return bb(i,b,a+b);}bb(i,a,b){if(i)a=aa(i-1,a,b);return a;}

Score 0 or 1

h(auto, auto) { return 1; }

Score 0 or 3

if() { return 1; }

1 Reserved word list: auto break case char const continue default do double else enum extern float for goto if int long register return short signed sizeof static struct switch typedef union unsigned void volatile while

feersum

Posted 2018-12-01T23:28:03.563

Reputation: 29 566

May the input string be empty? (If so, I think it should score 3.) – Arnauld – 2018-12-02T02:05:58.167

@Arnauld Yes, that's correct. – feersum – 2018-12-02T02:47:13.120

How are we supposed to check the number of arguments of undeclared functions? Is the 4th example in "Score 1" not well-formed because write was called once with no argument and once with one argument? – Arnauld – 2018-12-02T11:24:10.013

@Arnauld Yes, you check that each call to a function which is not defined has the same number of arguments. – feersum – 2018-12-02T17:23:37.550

I see. Well, this is currently not compatible with my parser, so I'm deleting my answer for now. I may give it another try later. – Arnauld – 2018-12-02T17:30:10.030

Should it be well-formed to assign to variables not found in the argument list? The test cases don't cover that, and assign-stmt is currently defined using identifier, not var-expr which is required to be in the argument list. – PurkkaKoodari – 2018-12-05T14:35:45.537

@Pietu1998 Thanks, fixed. I changed the left side of an assignment to var-expr so this is no longer allowed by the spec. – feersum – 2018-12-06T04:18:04.280

What about the binary operators &&, ||, &, | and ^ – None – 2018-12-06T05:05:00.370

@Rogem The characters &|^ are guaranteed not to appear in the input. – feersum – 2018-12-06T05:13:59.437

Also, suggest adding the empty program as a test case. – None – 2018-12-06T06:13:34.450

Answers

3

JavaScript (ES6), 599 bytes

T=_=>P=I.shift()
B=v=>I.unshift(P)&&v
J=_=>/^[a-z]+$/.test(T())*7
M=p=>T()==p&&7
C=(n,a)=>n in U?U[n]==a?7:1:(U[n]=a,7)
E=(m,O=n=>E()&(M`,`?O(n+1):P==")"&&C(m,n)))=>(M`(`?E()&M`)`:P==+P?7:J(m=B(P))&(M`(`?++z&&M`)`?C(m,0):O(B(1)):B(A[m]|1)))&(/[+*/%-]/.test(T())?E(++y):B(7))
S=_=>I[0]?M`return`?E()&M`;`&M`}`&C(F,N)&((x|y|z)>1?3:7):(P=="if"?M`(`&E()&M`)`:B(7))&J(++x)&(A[P]|1)&M`=`&E()&M`;`&S():0
G=_=>J(++N)&(A[P]|L[P]&8?1:A[P]=L[P]=7)&(M`,`?G():P==")"&&M`{`&S())
Q=_=>I[N=x=y=z=0]?J(A={})&(L[F=P]?1:L[F]=15)&M`(`&(M`)`?M`{`&S():G(B()))&Q():7
f=c=>Math.log2(Q(U={},L={},I=c.match(/\w+|\S/g)||[])+1)

Try it online! (comes with test cases)

Defines a function f that takes the code as argument.

(Some) explanation

This monstrosity is essentially a huge recursive bitwise AND, which somehow happens to work as a parser for this C subset. Scores are stored as 0137 for 0123 respectively and converted in the end as log2(score+1); if any phase detects a problem in the code, it returns a score lower than 7, which in turn unsets bits from the final output and results in a lower score. All functions except for T and B return a score.

Used identifiers are tracked in L and function argument counts in U. The current function's argument count is N and the names are in A. Assignments, binary operators and calls are tracked in x, y and z respectively.

If the code runs out of input, it will just continue happily popping undefineds off the token deque. This is fine, since the only function that matches undefined is J, and anything will eventually finish recursing and return 0 due to not matching a closing }. (Except for S and Q which have explicit checks for end of input.) Even pushing undefined back on the deque is harmless, since popping undefined is the same as popping an empty array.

PurkkaKoodari

Posted 2018-12-01T23:28:03.563

Reputation: 16 699