Parikh's theorem

Parikh's theorem in theoretical computer science says that if one looks only at the number of occurrences of each terminal symbol in a context-free language, without regard to their order, then the language is indistinguishable from a regular language.[1] It is useful for deciding that strings with a given number of terminals are not accepted by a context-free grammar.[2] It was first proved by Rohit Parikh in 1961[3] and republished in 1966.[4]

Definitions and formal statement

Let $\Sigma =\{a_{1},a_{2},\ldots ,a_{k}\}$ be an alphabet. The Parikh vector of a word is defined as the function ${\textstyle p:\Sigma ^{*}\to \mathbb {N} ^{k}}$ , given by[1]

p(w)=(|w|_{a_{1}},|w|_{a_{2}},\ldots ,|w|_{a_{k}})

where $|w|_{a_{i}}$ denotes the number of occurrences of the letter $a_{i}$ in the word $w$ .

A subset of $\mathbb {N} ^{k}$ is said to be linear if it is of the form

u_{0}+\mathbb {N} u_{1}+\dots +\mathbb {N} u_{m}=\{u_{0}+t_{1}u_{1}+\dots +t_{m}u_{m}\mid t_{1},\ldots ,t_{m}\in \mathbb {N} \}

for some vectors ${\textstyle u_{0},\ldots ,u_{m}}$ . A subset of $\mathbb {N} ^{k}$ is said to be semi-linear if it is a union of finitely many linear subsets.

Statement 1: Let $L$ be a context-free language. Let $P(L)$ be the set of Parikh vectors of words in $L$ , that is, ${\textstyle P(L)=\{p(w)\mid w\in L\}}$ . Then $P(L)$ is a semi-linear set.

Two languages are said to be commutatively equivalent if they have the same set of Parikh vectors.

Statement 2: If $S$ is any semi-linear set, the language of words whose Parikh vectors are in $S$ is commutatively equivalent to some regular language. Thus, every context-free language is commutatively equivalent to some regular language.

These two equivalent statements can be summed up by saying that the image under $p$ of context-free languages and of regular languages is the same, and it is equal to the set of semilinear sets.

Strengthening for bounded languages

A language $L$ is bounded if $L\subset w_{1}^{*}\ldots w_{k}^{*}$ for some fixed words $w_{1},\ldots ,w_{k}$ . Ginsburg and Spanier [5] gave a necessary and sufficient condition, similar to Parikh's theorem, for bounded languages.

Call a linear set stratified, if in its definition for each $i\geq 1$ the vector $u_{i}$ has the property that it has at most two non-zero coordinates, and for each $i,j\geq 1$ if each of the vectors $u_{i},u_{j}$ has two non-zero coordinates, $i_{1}<i_{2}$ and $j_{1}<j_{2}$ , respectively, then their order is not $i_{1}<j_{1}<i_{2}<j_{2}$ . A semi-linear set is stratified if it is a union of finitely many stratified linear subsets.

The Ginsburg-Spanier theorem says that a bounded language $L$ is context-free if and only if $\{(n_{1},\ldots ,n_{k})\mid w_{1}^{n_{1}}\ldots w_{k}^{n_{k}}\in L\}$ is a stratified semi-linear set.

Significance

The theorem has multiple interpretations. It shows that a context-free language over a singleton alphabet must be a regular language and that some context-free languages can only have ambiguous grammars. Such languages are called inherently ambiguous languages. From a formal grammar perspective, this means that some ambiguous context-free grammars cannot be converted to equivalent unambiguous context-free grammars.

gollark: Apart from sorting it the right way round.

gollark: I've made this very barebones RSS reader; how can I make it look nice?

gollark: Markov chain.

gollark: Has tjwld become less coherent or is it a Markov chain?:

gollark: Piephon has a good websockets library.

References

Kozen, Dexter (1997). Automata and Computability. New York: Springer-Verlag. ISBN 3-540-78105-6.
Håkan Lindqvist. "Parikh's theorem" (PDF). Umeå Universitet.
Parikh, Rohit (1961). "Language Generating Devices". Quartly Progress Report, Research Laboratory of Electronics, MIT.
Parikh, Rohit (1966). "On Context-Free Languages". Journal of the Association for Computing Machinery. 13 (4).
Ginsburg, Seymour; Spanier, Edwin H. (1966). "Presburger formulas, and languages". Pacific Journal of Mathematics. 16 (2): 285–296.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[kozen-1] Kozen, Dexter (1997). Automata and Computability. New York: Springer-Verlag. ISBN 3-540-78105-6.

[2] Håkan Lindqvist. "Parikh's theorem" (PDF). Umeå Universitet.

[3] Parikh, Rohit (1961). "Language Generating Devices". Quartly Progress Report, Research Laboratory of Electronics, MIT.

[4] Parikh, Rohit (1966). "On Context-Free Languages". Journal of the Association for Computing Machinery. 13 (4).

[5] Ginsburg, Seymour; Spanier, Edwin H. (1966). "Presburger formulas, and languages". Pacific Journal of Mathematics. 16 (2): 285–296.