Van Wijngaarden grammar
In computer science, a Van Wijngaarden grammar (also vW-grammar or W-grammar) is a two-level grammar which provides a technique to define potentially infinite context-free grammars in a finite number of rules. The formalism was invented by Adriaan van Wijngaarden[1] to define rigorously some syntactic restrictions which previously had to be formulated in natural language, despite their essentially syntactical content.
Typical applications are the treatment of gender and number in natural language syntax and the well-definedness of identifiers in programming languages. For example, "there is one person" and "there are two people" are both grammatically correct but "there are one person" is incorrect for context-sensitive reasons that a W-grammar could represent.
The technique was used and developed in the definition of the programming language ALGOL 68. It is an example of the larger class of affix grammars.
Overview
A W-grammar consists of a finite set of meta-rules, which are used to derive (possibly infinitely many) production rules from a finite set of hyper-rules. Meta-rules are restricted to those defined by a context-free grammar. Hyper-rules restrict the admissible contexts at the upper level. Essentially, the consistent substitution used in the derivation process is equivalent to unification, as in Prolog, as was noted by Alain Colmerauer.
For example, the assignment x:=1
is only valid if the variable x can contain an integer. Therefore, the context-free syntax variable := value
is incomplete. In a two-level grammar, this might be specified in a context-sensitive manner as REF TYPE variable := TYPE value
. Then ref integer variable := integer value
could be a production rule but ref Boolean variable := integer value
is not a possible production rule. This also means that assigning with incompatible types becomes a syntax error which can be caught at compile-time. Similarly,
STYLE begin token, new LAYER1 preludes, parallel token, new LAYER1 tasks PACK, STYLE end token
allows begin ... end
and { ... }
but not begin ... }
.
ALGOL 68 examples
Prior to ALGOL 68 the language ALGOL 60 was formalised using the context-free Backus–Naur form. The appearance of new context-sensitive two-level grammar presented a challenge to some readers of the 1968 ALGOL 68 "Final Report". Subsequently, the final report was revised by Wijngaarden and his colleagues and published as the 1973 ALGOL 68 "Revised Report".
The grammar for ALGOL 68 is officially defined in the two-level Van Wijngaarden grammar, but a subset has been done in the one-level Backus–Naur form, compare:
ALGOL 68 as in the 1968 Final Report §2.1
a) program : open symbol, standard prelude, library prelude option, particular program, exit, library postlude option, standard postlude, close symbol. b) standard prelude : declaration prelude sequence. c) library prelude : declaration prelude sequence. d) particular program : label sequence option, strong CLOSED void clause. e) exit : go on symbol, letter e letter x letter i letter t, label symbol. f) library postlude : statement interlude. g) standard postlude : strong void clause train
ALGOL 68 as in the 1973 Revised Report §2.2.1, §10.1.1
program : strong void new closed clause
A) EXTERNAL :: standard ; library ; system ; particular. B) STOP :: label letter s letter t letter o letter p.
a) program text : STYLE begin token, new LAYER1 preludes, parallel token, new LAYER1 tasks PACK, STYLE end token. b) NEST1 preludes : NEST1 standard prelude with DECS1, NEST1 library prelude with DECSETY2, NEST1 system prelude with DECSETY3, where (NEST1) is (new EMPTY new DECS1 DECSETY2 DECSETY3). c) NEST1 EXTERNAL prelude with DECSETY1 : strong void NEST1 series with DECSETY1, go on token ; where (DECSETY1) is (EMPTY), EMPTY. d) NEST1 tasks : NEST1 system task list, and also token, NEST1 user task PACK list. e) NEST1 system task : strong void NEST1 unit. f) NEST1 user task : NEST2 particular prelude with DECS, NEST2 particular program PACK, go on token, NEST2 particular postlude, where (NEST2) is (NEST1 new DECS STOP). g) NEST2 particular program : NEST2 new LABSETY3 joined label definition of LABSETY3, strong void NEST2 new LABSETY3 ENCLOSED clause. h) NEST joined label definition of LABSETY : where (LABSETY) is (EMPTY), EMPTY ; where (LABSETY) is (LAB1 LABSETY1), NEST label definition of LAB1, NEST joined label definition of$ LABSETY1. i) NEST2 particular postlude : strong void NEST2 series with STOP.
Implementations
yo-yo[4] parser for van Wijngaarden grammars with example grammars for expressions, eva, sal and Pascal (the actual ISO 7185 standard for Pascal uses extended Backus–Naur form).
History
W-grammars are based on the idea of providing the nonterminal symbols of context-free grammars with attributes (or affixes) that pass information between the nodes of the parse tree, used to constrain the syntax and to specify the semantics. This idea was well known at the time; e.g. Donald Knuth visited the ALGOL 68 design committee while developing his own version of it, attribute grammars.[5] Quite peculiar to W-grammars was their strict treatment of attributes as strings, defined by a context-free grammar, on which concatenation is the only possible operation; in attribute grammars, attributes can be of any data type, and any kind of operation can be applied to them.
After their introduction in the Algol 68 report, W-grammars were widely considered as too powerful and unconstrained to be practical. This was partly a consequence of the way in which they had been applied; the revised Algol 68 report contained a much more readable grammar, without modifying the W-grammar formalism itself.
Meanwhile, it became clear that W-grammars are indeed too powerful. They describe precisely all recursively enumerable languages,[6] which makes parsing impossible in general: it is an undecidable problem to decide whether a given string can be generated by a given W-grammar. Their use must be seriously constrained when used for automatic parsing or translation. Restricted and modified variants of W-grammars were developed to address this, e.g.
- Extended Affix Grammars, applied to describe the grammars of natural language such as English and Spanish)
- Q-systems, also applied to natural language processing
- the CDL series of languages, applied as compiler construction languages for programming languages
Applications outside of ALGOL 68
Anthony Fisher has written a parser for a large class of W-grammars.[4]
Dick Grune created a C program that would generate all possible productions of a 2-level grammar.[7]
The applications of EAGs mentioned above can effectively be regarded as applications of W-grammars, since EAGs are so close to W-grammars.
W-grammars have also been proposed for the description of complex human actions in ergonomics.
- A W-Grammar Description for Ada [8] – "W-grammar description of Ada is still useful to computer scientists who need more than a simple understanding of the syntax and rudimentary description of the semantics"
See also
- Affix grammar
- Extended Affix Grammar
- Attribute grammar
References
- van Wijngaarden, Adriaan (1965), MR 76: Orthogonal design and description of a formal language (PDF), Amsterdam, The Netherlands: CWI.
- Kleine, "Algol 68", Languages history (PDF) (report attachment), DE: FH Jena.
- "Syntax", Algol 68, FR: Univ Poitiers.
- Fisher, Anthony, "yo-yo", Software, UK: York.
- Knuth, Donald E (1990), "The genesis of attribute grammars" (gZiped plain text), Proceedings of the international conference on Attribute grammars and their applications, US: Stanford: 1–12.
- Sintzoff, M. (1967). "Existence of a van Wijngaarden syntax for every recursively enumerable set". Annales de la Société scientifique de Bruxelles. 81: 115–118.
- Grune, Dick, A Two-Level Sentence Generator, NL: VU.
- ADA177802: A W-Grammar Description for Ada, US: DTIC.
External links
- "Use in Algol68", Computer science, Book rags.
- Pemberton, Steven, VW (tutorial introduction), NL: CWI.