Optimize Compiler for simple Reverse Polish Notation Programming Language

24

5

Description

Imaginary programming language (IPL) uses Polish Reverse Notation. It has the following commands:

  • i -- input number and push it to the stack
  • o -- non-destructive output top of the stack (number stays on the stack)
  • d -- discard top of stack
  • integer number -- push this number to the stack
  • +-* -- pop two numbers from the stack, perform corresponding operation and push back the result. There is no division in IPL.

IPL works only with integers and is used for simple calculations. An IPL program is written on one line and separated by spaces. Empty string is a valid IPL program.

IPL Program:

i i + o 

Inputs two numbers, adds them together and outputs the result.

Input numbers and integers that can be pushed to stack are in range [-999, 999], however output can be any. If your language do not support big numbers it is okay though.

Input/output format

You may choose any input/output format as long as it clear to understand and read/write: string, list, tokens etc.

Task

You are given some IPL program, you need to optimize it (reduce length):

i 12 + 3 + o d 2 3 + d

After optimization will become

i 15 + o

You do not have to preserve stack state, but amount of inputs and outputs and their order should match for the original and optimized program.

So IPL program:

-40 i * 2 * o i + 3 1 + o i 2 *

After optimisation will become

i -80 * o i 4 o i

or

-80 i * o i 4 o i

(note that you have to save all inputs, even if they are irrelevant).

There should be no hardcoding for test cases, code should work on any arbitrary IPL program and produce shortest possible IPL program that meets the requirements.

Scoring

Default code-golf scoring.

UPDATE: changed scoring to pure code golf scoring, as per @Sanchises suggestion.

Test cases:

Input:

(empty string)

Possible output:

(empty string)

Input:

i 4 * 2 + 3 * 6 - o

Possible output:

i 12 * o

Input:

1 1 + o

Possible output:

2 o

Input:

i 2 + 3 + o d 2 3 + d

Possible output:

i 5 + o

Input:

-40 i * 2 * o i + 3 1 + o i 2 *

Possible output:

-80 i * o i 4 o i

Input:

i i 1 + i 1 + i 1 + i 1 + d d d d o 

Possible output:

i i i i i d d d d o 

Input:

i i i 0 * * * o

Possible output:

i i i 0 o

Input:

i i i 1 * * * o

Possible output:

i i i * * o

Input:

i 222 + i 222 - + o

Possible output:

i i + o

Input:

i 2 + 3 * 2 + 3 * 2 + 3 * i * d i 2 + 3 * i + d i o 2 + 2 - 0 * 1 o

Possible output:

i i i i i o 1 o

Input:

i 1 + 2 * 1 + o 

Possible output:

i 2 * 3 + o

Input:

1 1 + o i 2 + 3 + o d 2 3 + d 4 i * 2 * o i + 3 1 + o i 2 * i i 1 + i 1 + i 1 + i 1 + d d d d o i i i 0 * * * o i i i 1 * * * o i 2 + i 2 - + o i 2 + 3 * 2 + 3 * 2 + 3 * i * d i 2 + 3 * i + d i o 2 + 2 - 0 * 1 o

Possible output:

2 o i 5 + o 8 i * o i 4 o i i i i i i d d d d o i i i 0 o i i i * * * o i i + o i i i i i o 1 o

Андрей Ломакин

Posted 2018-08-01T11:42:06.633

Reputation: 309

1A question: can you simplify i i d o to i o i (the input is in order and the output is in order) or should you not simplify it? (the set of input and outputs should be in order) – Sanchises – 2018-08-01T12:01:03.207

1@Sanchises no, inputs and outputs should be in order. If original program inputs 2 numbers before outputing anything optimised should do the same. – Андрей Ломакин – 2018-08-01T12:03:02.973

1Welcome to PPCG! Nice first Challenge! – Luis felipe De jesus Munoz – 2018-08-01T13:49:49.993

1May there be d or o when stack is empty? – user202729 – 2018-08-02T02:03:56.113

1Should we reduce number of tokens or number of bytes in the IPL program? – user202729 – 2018-08-02T02:09:56.377

1@user202729 no, input code is always valid – Андрей Ломакин – 2018-08-02T07:13:38.583

1@user202729 number of tokens – Андрей Ломакин – 2018-08-02T07:14:01.987

i i - o cannot be optimized to i i 0 o cause inputs can be different, so there is could be any output not only zero – Андрей Ломакин – 2018-08-02T07:15:06.703

@АндрейЛомакин, yes, sorry, that optimization would be questionable. Still, if the code cannot be shortened, is it valid to alter it without altering its functionality, i.e. optimizing i 2 * o to 2 i * o? – Jonathan Frech – 2018-08-02T12:29:26.887

@JonathanFrech Ithink it is fine to reorder as long as it remains optimal. And I added test case, thanks. – Андрей Ломакин – 2018-08-02T13:34:37.527

6

From review queue, I do not think this challenge is unclear. If you do, please comment about why.

– mbomb007 – 2018-08-02T14:17:11.323

@mbomb007 I'm certainly not sure what is meant by "no hardcoding cases". – Post Rock Garf Hunter – 2018-08-02T19:27:29.930

2@WW I think the OP means that you should not hardcode only the test cases listed in the question. You have to support arbitrary input. There should be no hardcoding for test cases, code should work *on any arbitrary IPL program* – mbomb007 – 2018-08-02T20:51:30.190

Answers

5

Wolfram Language (Mathematica), 733 728 690 564 516 506 513 548 bytes

j=Integer;f=Flatten;s=SequenceReplace;A=FixedPoint[f@s[#,{{x_j,p,y_j,t}->{y,t,x*y,p},{x_j,y_j,p}->x+y,{x_j,y_j,t}->x*y,{x_j,p,y_j,p}->{x+y,p},{x_j,t,y_j,t}->{x*y,t},{0,p}|{1,t}->{},{0,t}->{d,0}}]//.{a___,Except[i|o]}->{a}&,#]&;B=Expand@Check[f@FoldPairList[f/@Switch[#2,i,{{i},{#,i@c++}},o,{{Last@#},#},d,{{},Most@#},p,{{},{#[[;;-3]],Tr@#[[-2;;]]}},t,{{},{#[[;;-3]],#[[-2]]*Last@#}},_,{{},{##}}]&,c=0;{},#],x]&;F=MinimalBy[w=A@f[#/.m->{-1,t,p}];z=B@w;s[#,{-1,t,p}->m]&/@A/@Select[Permutations@Join[w,Cases[z /.i@_->i,_j,∞]],B@#==z&],Length][[1]]&

Try it online!

This is a four-step tour-de-force that (1) replaces "-" with "-1 * +" so that we don't have to deal with subtractions, (2) simplifies the list of commands a bit, (3) makes a list of all permutations of this list of commands and picks out those that give the same result when parsed (executed), and (4) simplifies these lists of commands a bit and picks the shortest, after converting certain operations back to subtractions.

This code is terribly inefficient because it goes through the list of all permutations of the input code. For long input codes I don't recommend running this code; but as I read it there are no runtime or memory restrictions in this challenge.

This code does the optimization step after converting all "-" operations to "+" operations with signs flipped, and only at the end re-introduces the "-" operator when converting code back to strings. This implies for example that "i -1 i * + o" is correctly optimized to "i i - o".

As the i/o format requirement is quite loose, this code takes and returns code as lists, where the symbols "+", "-", "*" are represented by p, m, t, tokens respectively. The conversion from and to strings is done in the wrapper function given on TIO:

G[S_] := StringReplace[{"p" -> "+", "m" -> "-", "t" -> "*"}]@StringRiffle@
         Quiet@F@
         ToExpression[StringSplit[S] /. {"+" -> p, "-" -> m, "*" -> t}]

Un-golfed version, including the string-format wrapper and minimizing the final code string length instead of the number of tokens, and including a few more transformation niceties:

(* convert code string to list of operators *)
inputfilter[s_] := ToExpression[Flatten[StringSplit[s] /.
  {"i" -> i, "o" -> o, "d" -> d, "+" -> p, "-" -> {-1, t, p}, "*" -> t}]]

(* convert list of operators to code string *)
outputfilter[s_] := StringReplace[StringRiffle@Flatten@SequenceReplace[s,
  {{-1, t, p} -> m,                         (* convert "-1 t p" back to "-"             *)
   {x_ /; x < 0, p} -> {-x, m},             (* convert "y x +" to "y -x -" when x<0     *)
   {x_ /; x < 0, t, p} -> {-x, t, m}}],     (* convert "y x * +" to "y -x * -" when x<0 *)
  {"m" -> "-", "p" -> "+", "t" -> "*"}]     (* backsubstitution of symbols              *)

(* simplify a list of operators somewhat *)
simplifier[s_] := FixedPoint[Flatten@SequenceReplace[#,
  {{x_Integer, p, y_Integer, t} -> {y, t, x*y, p},  (*  "x + y *" -> "y * (xy) +"       *)
   {x_Integer, y_Integer, p} -> x + y,              (*  "x y +" -> "(x+y)"              *)
   {x_Integer, y_Integer, t} -> x*y,                (*  "x y *" -> "(xy)"               *)
   {x_Integer, p, y_Integer, p} -> {x + y, p},      (*  "x + y +" -> "(x+y) +"          *)
   {x_Integer, t, y_Integer, t} -> {x*y, t},        (*  "x * y *" -> "(xy) *            *)
   {0, p} | {1, t} -> {},                           (*  "0 +" and "1 *" are deleted     *)
   {x_Integer, i, p} -> {i, x, p},                  (*  "x i +" -> "i x +"              *)
   {x_Integer, i, t} -> {i, x, t},                  (*  "x i *" -> "i x *"              *)
   {0, t} -> {d, 0}}] //.                           (*  "0 *" -> "d 0"                  *)
  {a___, Except[i | o]} -> {a} &, s]                (* delete trailing useless code     *)

(* execute a list of operators and return the list of generated outputs *)
parse[s_] := Expand@Quiet@Check[Flatten@FoldPairList[  (* stack faults are caught here     *)
  Function[{stack, command},                        (* function called for every command*)
    Flatten /@ Switch[command,                      (* code interpretation:             *)
    i, {{i}, {stack, i[inputcounter++]}},           (* output "i" and add input to stack*)
    o, {{stack[[-1]]}, stack},                      (* output top of stack              *)
    d, {{}, Most[stack]},                           (* delete top of stack              *)
    p, {{}, {stack[[;; -3]], stack[[-2]] + stack[[-1]]}},  (* add two stack elements    *)
    t, {{}, {stack[[;; -3]], stack[[-2]]*stack[[-1]]}},    (* multiply two stack elements*)
    _, {{}, {stack, command}}]],                    (* put number onto stack            *)
    inputcounter = 0; {},                           (* start with zero input counter and empty stack*)
    s],                                             (* loop over code list              *)
  x]                                                (* return "x" if an error occurred  *)

(* the main function that takes a code string and returns an optimized code string *)
F[s_] := Module[{w, q},
  w = simplifier@inputfilter@s;      (* convert input to useful form *)
  q = parse[w];                      (* execute input code *)
  MinimalBy[
    outputfilter@*simplifier /@      (* simplify and stringify selected codes          *)
      Select[Permutations[w],        (* all permutations of code list                  *)
             parse[#] == q &],       (* select only those that give the correct output *)
    StringLength] // Union]          (* pick shortest solution by length               *)

Thanks to @redundancy for catching a bug: The parser needs a Expand applied to the output in order to handle distributive equivalence. 506→513

update

Now also optimizes 1 o 1 + o to 1 o 2 o. This was a surprisingly difficult case and made the code much slower. 513→548

Roman

Posted 2018-08-01T11:42:06.633

Reputation: 1 190

Seems like this gives an error on test case i i 1 + i 1 + i 1 + i 1 + d d d d o. – Grimmy – 2019-08-27T15:27:58.257

@Grimy as I said, this code doesn't run for large problems because it goes through an exhaustive combinatorical search of code space. Your error is an out-of-memory fault on TIO, and not due to my code. – Roman – 2019-08-27T15:57:26.877

@Grimy for "i i 1 + d o" my code gives "i i d o", which I consider optimized. For "i i 1 + i 1 + d d o" it gives "i i i + d o", which has the same number of tokens as the more obvious "i i i d d o" optimization. I have not tried longer inputs. – Roman – 2019-08-27T18:24:32.733

I believe input i 2 * i 2 * + o should produce optimized output i i + 2 * o, but this code returns the (unoptimized) input. – redundancy – 2019-08-28T21:09:35.173

Thanks @redundancy, it's fixed and your example is now one of the included test cases. – Roman – 2019-08-29T09:45:37.493

Thanks a lot for the bounty! This was a tough problem and I think I've presented a crummy solution. – Roman – 2019-08-30T07:07:35.470