chomski

pep, pep virtual machine
Paradigm	scripting language
Designed by	mj bishop
First appeared	2007
Typing discipline	none; all data is treated as a string
OS	Cross-platform
Website	bumble.sourceforge.net/books/pars/
Major implementations
	bumble.sourceforge.net/books/pars/
Influenced by
	Sed, Awk

pattern parsing virtual machine (previously called 'chomski' after Noam Chomsky) and pep refer to both a command line computer language and utility (interpreter for that language) which can be used to parse and transform text patterns and (formal mathematical) languages. The utility reads input files character by character (sequentially), applying the operation which has been specified via the command line or a pep script, and then outputs the line. It was developed from 2006 in the C language. Pep has derived a number of ideas and syntax elements from Sed, a command line text stream editor.

Features

The pattern-parser language uses many ideas taken from sed, the Unix stream editor. For example, sed includes two virtual variables or data buffers, known as the "pattern space" and the "hold space". These two variables constitute an extremely simple virtual machine. In the pep language this virtual machine has been augmented with several new buffers or registers along with a number of commands to manipulate these buffers.

The parsing virtual machine includes a tape data structure as well as a stack (data structure), along with a "workspace" (which is the equivalent of the sed "pattern space" and a number of other buffers of lesser importance. This virtual machine is designed specifically to be apt for the parsing of formal languages. This parsing process traditionally involves two phases; the lexical analysis phase and the formal grammar phase. During the lexical analysis phase as series of tokens are generated. These tokens are then used as the input for a set of formal grammar rule. The chomski virtual machine uses the stack to hold these tokens and uses the tape structure to hold the attributes of these parse tokens. In a pep script, these two phases, lexing and parsing, are combined in one script file. A series of command words are used to manipulate the different data structures of the virtual machine.

Purpose and motivation

The purpose of the pep tool is to parse and transform text patterns. The text patterns conform to the rules provided in a formal language and include many context free languages. Whereas traditional Unix tools (such as awk, sed, grep, etc.) process text one line at a time, and use regular expressions to search or transform text, the pep tool processes text one character at a time and can use context free grammars to transform (or compile) the text. However, in common with the Unix philosophy, the pep tool works upon plain text streams, encoded according to the locale of the local computer, and produces as output another plain text stream, allowing the pep tool to be used as part of a standard pipeline.

The motivation for the creation of the pp tool and the virtual machine was to allow the writing of parsing scripts, rather than having to resort to traditional parsing tools such as Lex and Yacc or their many variants and improvements such as Antlr.

Usage

The following example shows a typical use of pep pattern parser, where the -e option indicates that the pattern parse expression follows:

$ pep -e 'read; "/"{ read; "*"{ until "*/"; clear; }} print; clear;' input.c > output.c

In the above script, C multiline comments (/* ... */) are deleted from the input stream.

The pattern parser tool was designed to be used as a filter in a pipeline: for example,

$ generate.data | pep -e '"x"{clear;add "y";}print;clear;'

That is, generate the data, and then make the small change of replacing x with y. However this functionality is not currently available because the pep tool also includes a comprehensive script viewer and debugger and so cannot read from piped standard input.

Several commands can be put together in a file called, for example, substitute.pss and then be applied using the -f option to read the commands from the file:

$ pep -f substitute.pss file > output

Besides substitution, other forms of simple processing are possible. For example, the following uses the accumulator-increment command a+ and count commands to count the number of lines in a file:

$ pep -e '"\n" { a+;} clear; (eof) {count;print;}' textile

Complex "pep" constructs are possible, allowing it to serve as a simple, but highly specialised, programming language. pep has two flow control statements (apart from the test structures (eof), [class], == etc.), namely the .reparse and .restart commands, which jump back to the parse> label (no other labels are permitted).

History

The idea for the pep machine and language arose from the limitations of regular expression engines and sed which uses a line by line paradigm, and the limitations on parsing nested text patterns with regular expressions. Pep evolved as a natural progression from the grep and sed command. Development began approximately in 2006 and continues.[1]

Limitations

The pattern parsing script language is not a general purpose programming language. Like sed it is designed for a limited type of usage. The interpret and executable does not currently support unicode strings, since the implementation uses standard C character arrays. However scripts can also be translated into other languages (such as java and javascript) which do support unicode text. Since the virtual machine behind the pattern parser language is considerably more complex than that of sed it is necessary to be able to debug scripts. This facility is currently provided within the 'pep' executable.

gollark: I demand 100 dragon releases per month. If we run out of pending things, they should just be blank rectangles.

gollark: Would've been smarter if I wasn't locked, admittedly.

gollark: Hunting during halloween is a *great* idea.

gollark: Yep.

gollark: Is it just me or is the Cave™ laggy?

References

Developer’s (M.J. Bishop) personal recollection

External links

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[dev-1] Developer’s (M.J. Bishop) personal recollection

Unix command-line interface programs and shell builtins
File system	cat chmod chown chgrp cksum cmp cp dd du df file fuser ln ls mkdir mv pax pwd rm rmdir split tee touch type umask
Processes	at bg crontab fg kill nice ps time
User environment	env exit logname mesg talk tput uname who write
Text processing	awk basename comm csplit cut diff dirname ed ex fold head iconv join m4 more nl paste patch printf sed sort strings tail tr uniq vi wc xargs
Shell builtins	alias cd echo test unset wait
Searching	find grep
Documentation	man
Software development	ar ctags lex make nm strip yacc
Miscellaneous	bc cal expr lp od sleep true and false
Categories Standard Unix programs Unix SUS2008 utilities List