Tips for golfing in Postscript?

14

1

As one of the less popular languages, it's difficult to find literature on the avant garde of postscript hackery. So what discoveries have the golfers here made to exploit the stack model (or other features) to overcome Postscript's inherent verbosity?

luser droog

Posted 2012-10-26T23:34:14.730

Reputation: 4 535

Found some external pages: https://sites.google.com/site/codegolfingtips/Postscript

– luser droog – 2012-11-20T06:14:50.860

Answers

3

Embedded Decoder

A Postscript program has a unique(?) ability to read it's own program text as data. This is normally used by the image operator which receives a data-acquisition-procedure as input, and this procedure often uses currentfile followed by readline, readstring, or readhexstring. But seen another way, image is just another looping operator, so any loop can read-ahead. An example is the line-printer emulator from the Green Book.

Using the token operator invokes the scanner on a file or string, pulling off a number or space- (or otherwise-: see other answer) -delimited name.

A simple PS interpreter in PS:

{currentfile token not {exit} if dup type /arraytype ne {exec} if }loop

Binary Operator String Decoder

Since I can't seem to get raw binary tokens to work for me (see other answer), I've made use of the "embedded decoding" idea to exploit the binary token mechanism to pack code into 8-bit strings, and then manipulate and parse the commands from the string on the fly.

/.{
    <920>  % two-byte binary-encoded name template with 0x92 prefix
    dup 1 4 3 roll put  % insert number into string
    cvx exec  % and execute it
}def
/${
    //.   %the /. procedure body defined above
    73 .  %"forall" (by code number)
}def

The . procedure takes a number from the stack and inserts it as the second byte in a two-byte string, the first byte being the prefix-byte for a binary token, specifying an executable system name. We save a byte in the hexstring by using a rule of the scanner that an odd number of nibbles in the hexstring is padded with an extra 0 nibble, so 3 hex nibbles produces a 2-byte string. The string is then marked executable and called with exec which invokes the scanner, produces the desired executable system name, and then loads the name and executes the operator. The $ does this on each byte of a string on the stack, using the . procedure twice, once as the loop body, and then to execute the looping operator forall by number.

More compactly, these procedures look like this:

 /.{<920>dup 1 4 3 roll put cvx exec}def/${//. 73 .}def
%123457890123456789012345678901234567890123456789012345
%        1         2         3         4         5

So, 55 chars buys binary token strings. Or, for 6 (maybe 7, if you terminate it with a space) chars, you can load the G library with (G)run which defines . and $ as above (+ a few others to extend the range of ascii-reachable codes).

Further illustrated in my crossword puzzle answer.

luser droog

Posted 2012-10-26T23:34:14.730

Reputation: 4 535

1Actually, if the postscript is in encoded form, then this requires you be extra careful about comparing stuff. Specifically, if you are looking at operators, you must parse it as a token and then compare the token's name to the target value. – AJMansfield – 2013-07-12T15:48:54.620

2

When generating graphical output and console output does not matter, use = instead of pop.

Thomas W.

Posted 2012-10-26T23:34:14.730

Reputation: 1 081

2

While most postscript operators are syntactically identifiers (and therefore must be space- (or otherwise-) delimited), the names [, ], <<, and >> are self-delimiting and scanner will detect them without intervening space. For the same reason, you cannot refer to these names with the usual /literal syntax (eg. /[ is two tokens: an empty literal name equivalent to ()cvn cvlit, and the executable name [ equivalent to ([)cvn cvx exec).

In order to redefine these names, which cannot be mentioned by name, we can use strings which are implicitly converted to names when used as keys in a dictionary (convenient!).

This example illustrates abusing these operators to perform arithmetic.

%!
([)0 def
(])1 def
(<<){add}def
(>>){mul}def
 ]]<<]]]<<<<>> =
%1 1 add 1 1 1 add add mul = %prints 6

Also << and [ (and mark) all mean the same thing.


My own postscript interpreter, xpost, also makes the right-curly brace available with some restrictions. discussion

luser droog

Posted 2012-10-26T23:34:14.730

Reputation: 4 535

2Also, / ends the previous token so you don't need a space before it. – Geoff Reedy – 2012-10-27T02:32:12.563

2

Here's a quickie: wrap multiple definitions in [...>>begin to eliminate the keyword def (nb. [ is the same as <<).

 def def
[>>begin

So remember: more than three two ... flock together! ;)

luser droog

Posted 2012-10-26T23:34:14.730

Reputation: 4 535

1This might not apply if you are naming something that terminates on a self-delimiting token. In /a{pop 2 mul}def or \b[2 3]def, the def only costs 3 characters, not 4. – AJMansfield – 2015-02-16T16:09:07.230

Shouldn't the rule be "more than two"? Compare /a 1 def/b 2 def/c 3 def with <</a 1/b 2/c 3>>begin. We need more spaces for def. – Thomas W. – 2014-03-12T08:18:30.700

Wow. I hadn't though of that. Yes, the calculation needs improvement. – luser droog – 2014-03-12T08:21:10.977

Actually, this should be [/a 1/b 2/c 3>>begin – Thomas W. – 2014-03-12T08:34:35.857

face-palm. . . . – luser droog – 2014-03-12T08:42:16.113

I've got some new ideas described in the chatroom

– luser droog – 2014-03-12T08:49:16.130

Sorry, I don't see anything there. Am I doing something wrong? – Thomas W. – 2014-03-14T12:58:47.920

I guess it scrolled away. I've got a start on a golfing library and a suite for APL-like calculations.

– luser droog – 2014-03-14T18:27:18.707

2

Replace hexstrings with ASCII85

Probably old news, but I just learned it. :)

You can do it using the postscript interpreter interactively with an encoding filter and cut-and-paste. But I'm going to show how to use dc to do it "by hand".

So, here's a hex string. We split it into 4-byte chunks.

95 20 6e d8   d0 59 49 35   50 74 ba c5   08 2d

Firing up dc, we input these as 32-bit (unsigned) big-endian-byte-order numbers. Then mod-off base-85 digits (there should be 5 until you get to 0).

0> dc
16i
95206ED8
Ai
d85%n85/
82
d85%n85/
83
d85%n85/
82
d85%n85/
78
d85%n85/
47
d85%n85/
0                    

Padding the last chunk with 00 00, yields (decimal), omitting the same number of bytes that we padded.

47 78 82 83 82   66 81 72 79 83   25 72 82 25 69  2 53 30 [2 53]

Add 33 to shift into the printable range of ASCII and poof! ASCII85.

80 111 115 116 115 99 114 105 112 116 58 105 115 58 102 35 86 63
which decodes to: Postscript:is:f#V? %%%Oops! should say 'fun'! I screwed up somewhere. :)

Wrap it in <~ ... ~>, and Level-2 Postscript can access 8-bit data, cheaper than hex.

luser droog

Posted 2012-10-26T23:34:14.730

Reputation: 4 535

1

Binary Tokens

For the ultimate zip-up of a PostScript program that final frontier is binary tokens which allows you to remove long operator names completely, at the cost of no longer having an ASCII-clean program.

So starting with a compacted block of postscript code

[/T[70{R 0 rlineto}48{}49{}43{A rotate}45{A neg rotate}91{currentdict
end[/.[currentpoint matrix currentmatrix]cvx>>begin begin}93{. setmatrix
moveto currentdict end end begin}>>/S{dup B eq{T begin exch{load exec}forall
end}{exch{load exch 1 add S}forall}ifelse 1 sub }>>begin moveto 0 S stroke

We look up all the names in the back of the PLRM (Appendix F, pp. 795-797)

appearance
in
vim    dec  meaning

<92>   146  'executable system name' binary token prefix
^A     1    add
^M     13   begin
^^     30   currentdict
'      39   currentmatrix
(      40   currentpoint
2      50   cvx
8      56   dup
9      57   end
=      61   eq  !)
>      62   exch
?      63   exec
I      73   forall
U      85   ifelse
d      100  load
h      104  matrix
k      107  moveto
n      110  neg
<85>   133  rlineto
<88>   136  rotate
§      167 stroke
©      169 sub

And then type them in prefixed by a 146 (decimal) byte. vim help for entering arbitrary bytes

Then in vim, the condensed file can be typed directly, so:

[/T[70{R 0^V146^V133}48{}49{}43{A^V146^V136} 45{A^V146^V110^V146^V136} 91{^V146^V30^V146^V57 [/.[^V146^V40^V146^V104 ^V146^V39]^V146^V50>> ^V146^V13^V146^V13}93{. ^V146^V156 ^V146^V107^V146^V30 ^V146^V57^V146^V57 ^V146^V13}>>/S{^V146^V56 B^V146^V61{T^V146^V13 ^V146^V62{^V146^V100 ^V146^V63}^V146^V73 ^V146^V57}{^V146^V62{ ^V146^V100^V146^V62

... you have to enter a space here to terminate the ^V-62 and start the 1, but you can back up and delete it later ...

1^V146^V1S}^V146^V73} ^V146^V85

... have to enter a space here to terminate the ^V-85 and start the 1, but you can back up and delete it later ...

1^V146^V169}>>^V146^V13 ^V146^V107

... 3rd digit of 3-digit code terminates the byte-entry so the following 0 here is normal, conveniently ...

0 S^V146^V167

Which will look like this on screen (in vim):

[/T[70{R 0<92><85>}48{}49{}43{A<92><88>}45{A<92>n<92><88>}
91{<92>^^<92>9[/.[<92>(<92>h<92>']<92>2>>
<92>^M<92>^M}93{.<92><9c><92>k<92>^^<92>9<92>9<92>^M}
>>/S{<92>8B<92>={T<92>^M<92>>{<92>d<92>?}<92>I<92>9}{<92>>
{<92>d<92>>1<92>^AS}<92>I}<92>U1<92>©}>><92>^M
<92>k0 S<92>§

This one can often be omitted entirely if the aim is just to show a picture. Ghostscript paints most things to the screen without needing showpage.

¡    161   showpage

[This actually isn't working. Ghostscript's giving me undefined and syntaxerror for these tokens. Maybe there's some mode I need to enable.]

luser droog

Posted 2012-10-26T23:34:14.730

Reputation: 4 535

Maybe it's something about this program. The online compactor doesn't like it, either.

– luser droog – 2013-07-20T06:46:13.370

1

Change negative rolls to positive

Negative rolls can always be changed to positive rolls.

3 -1 roll
3 2 roll

5 -2 roll
5 3 roll

luser droog

Posted 2012-10-26T23:34:14.730

Reputation: 4 535

1

The source file has moved since my last comment. Here's my implementation of the roll operator.

– luser droog – 2015-02-26T07:13:48.010

Which is more efficient 3 -1 roll or 3 2 roll? In my mental model, the former should be more efficient because it just takes one move. Is my mental model correct? – kiss my armpit – 2014-02-16T16:09:56.977

Honestly, I'm not sure. Here's my implementation, where all negative rolls are converted to positive as the first step. I think it would still require at least 2 moves (move 3rd value up, move 3 values down) with a flat-array implementation of the stack. But my stack is segmented so access goes through function calls to manage the segments; so there are certainly more efficient ways to implement than I have done. ... One thing I see now: I should check for stackunderflow outside of the loops.

– luser droog – 2014-02-17T05:16:02.453

1

Factor-out repeated uses of long operator names

If you're already using a <<>>begin dictionary, there is a constant overhead of /?{} 4 characters per redefinition. So an operator of length n repeated N times will yield a character-count change of
(4 + n) - (N * (n - 1)).

Setting this formula equal to 0 gives the equation of the break-even point. From this we can solve for each variable in terms of the other, yielding
n = - (N - 4) / (1 - N) and
N = (4 + n) / (n - 1) .

No we can answer questions like, "For how many uses of 'print' is it worthwhile to abbreviate?" n = 5, so N = 9/4. Take the ceiling, since you can't effectively call print 1/4 times. So, 3. 3 uses. And indeed,

print print print
/P{print}p p p

(assuming you've already paid the overhead of <<>>begin to activate the definition, of course).

Of course, binary tokens makes this kind of moot, giving you the first 255 names from the system name table as 2-bytes: 0x92, 0x??. And binary tokens are also self-delimiting, requiring no whitespace before or after, since the high-bit of the first byte is outside of the ascii range.

luser droog

Posted 2012-10-26T23:34:14.730

Reputation: 4 535

0

Use my G library

https://github.com/luser-dr00g/G

It's a text file. No extension, for the shortest possible syntax to load it.

It permits this 203-char Sierpinksi Triangle program

[48(0-1+0+1-0)49(11)43(+)45(-)/s{dup
0 eq{exch{[48{1 0 rlineto}49 1 index
43{240 rotate}45{120 rotate}>>exch
get exec}forall}{exch{load
exch 1 sub s}forall}ifelse 1 add}>>begin
9 9 moveto(0-1-1)9 s fill

to be rewritten in 151 bytes as

3(G)run $
{A - B + A + B - A}
{B B}

{A - B - B}7{[ex{du w{(>K?\2u)$}if}fora]}rep
cvx[/A{3 0 rl}/B 1 in/-{120 rot}/+{-120 rot}>>b
100 200(k?B9)$ showp

workfile with comments

Using the abbreviated system names feature, 1(G)run completely removes the burden of long operator names. An operator name need only be long enough to distinguish it from the others.

So

  • add becomes ad
  • mul becomes mu
  • index becomes i
  • etc, etc.

Use the PLRM Appendix F for the standard table of operator names.

And the feature of Operator Strings is available even if the abbreviated names is not selected. The bare library has a "base-level" selected by adding simply (G)run with no further decorations.

The base level includes a new function . which accepts the integer code for an operator (the same Appendix F mentioned above) and executes it.

The new function $ iterates through a string and calls . upon each one. So the ascii code directly selects the operator by number.

A new function @ lets you reach down to the bottom of the table in Appendix F by treating the space character (Ascii 0x20) as 0.

A new function # lets you reach further up into the table by first adding 95 (0x5F) so the space char 0x20 is treated as 127 (0x7F), the very next code after the last printable ascii character ~ 126 (0x7E).

Two new functions ! lets you access a deeply nested structure of arrays and/or dicts with an index array of indices/keys, rather than tedious expressions of many get (and put) operators.

(G)run 7 chars buys the base-level.

1(G)run 8 chars buys that AND abbreviated system names.

3(G)run $ 9 chars immediately begins an implicit-procedure block scanning sources lines until the next blank line, and defining the first line as a procedure called A, the next line is defined as a procedure called B, etc. This should remove most of the defs needed for defining lots of stuff, without needing to wrap them in a dictionary, nor even explicitly giving them names.

luser droog

Posted 2012-10-26T23:34:14.730

Reputation: 4 535