Tips for creating polyglots

48

8

A is a program that can be run in 2 or more different programming languages.

What general tips do you have for making polyglots, or choosing languages that are easy to write polyglots for a specific task?

Please post the tips that could be applied in most situations. I.e. they shouldn't work only in polyglots of two specific languages. (You could simply post an answer to a polyglot question if you have a too specific tip.) But you can introduce features of a language that makes it easy to work with many languages, or easy to be added to any existing polyglots.

Please post one tip per answer. And feel free to suggest edit if a language-specific tip also applies to another language.

jimmy23013

Posted 2016-11-26T00:59:02.080

Reputation: 34 042

Answers

25

Exploit comment symbols

A simple way to create a two-language polyglot is to have the code divided in two parts as follows:

  1. The first part does the actual work in language A, is harmless in language B (no errors), and ends in a language-A comment symbol, which hides the second part to language A.
  2. The second part does the actual work in language B.

Thus

  • Language A sees the first part, which does the job, and then a comment.
  • Language B sees a useless first part and then the second part, which does the job.

The only difficult part here is finding a set of statements (first part) that do the job in language A while not giving errors in language B. Some suggestions for this:

  • Most stack-based languages allow displaying only the top of the stack at the end of the program (sometimes this is even default, as in 05AB1E).
  • Some languages ignore undefined statements (for example Golfscript).

A simple example that uses these guidelines can be found here. Languages A and B are MATL and 05AB1E respectively.

Luis Mendo

Posted 2016-11-26T00:59:02.080

Reputation: 87 464

24

Use two-dimensional languages

Unlike one-dimensional languages, which generally parse the entire source code and will produce syntax errors or unwanted runtime effects on things they don't understand (thus forcing you to hide other languages' code from them), two-dimensional languages tend to only parse code in the path of execution, meaning that the entire rest of the program is ignored. There's also a lot more room to split execution paths away from each other in two dimensions; you can send the instruction pointer turning in an unusual direction, like down or even left (wrapping round to the right hand side of the program), to get it out of your way very quickly. The techniques useful in one-dimensional languages also generalise to two-dimensional languages (e.g. you can skip over code with ;; in Befunge-98, in addition to just sending the IP in a weird direction), making this mostly just a strict gain compared to a one-dimensional solution.

As a bonus, several two-dimensional languages have an entry point other than the top-left of the program, meaning that you don't need to go to any effort to split them away from other languages; they'll split themselves off from the group naturally.

user62131

Posted 2016-11-26T00:59:02.080

Reputation:

20

Know your Trues and Falses

Each language sees "true" and "false" in a slightly different way. If they have similar syntax, you can exploit this by adding a decision that the languages will handle differently.

One example from the Trick or Treat thread uses '', an empty string. In Lua, this evaluates to truthy, but falsy in Python, so the following:

print(''and'trick'or'treat')

..will print a different string in each language.

All it takes is finding a value like this. For example, you could use '0', which evaluates to false in PHP but true in Python.

FlipTack

Posted 2016-11-26T00:59:02.080

Reputation: 13 242

17

Blockquotes in at least one language

Here's an example that works both in Python and C++

#include <iostream> /*
""" */
int main() {
    std::cout << "Hello World!\n";
}

/* """
print("Hello World!")
# */

Luis Mendo pitched what I think is by far the easiest solution, which is to use comments.

You look for one language that has block commenting and another language where regular syntax in the first is commenting syntax in the second.

Even easier is two languages with different block commenting styles that are interchangeably correct syntax, but I couldn't be bothered to check.

Check it out in Python 3.5 and C++

dexgecko

Posted 2016-11-26T00:59:02.080

Reputation: 271

2The first line there shouldn't have a semicolon. – None – 2016-11-26T03:00:45.697

True. Good point – dexgecko – 2016-11-26T03:08:26.320

15

Divide and conquer

When you're writing a polyglot in a large number of languages, you won't necessarily be able to separate all the language's control flows from each other immediately. Thus, you'll need to "true polyglot" some of the languages for some length of time, allowing the same code to run in each of them. There are two main rules to bear in mind while you're doing this:

  • The control flow in any two languages should either be very similar, or very different. Trying to handle a large number of interleaved control flows is a recipe for getting confused and makes your program hard to modify. Instead, you should limit the amount of work you have to do by ensuring that all the programs that are in the same place are there for the same reason and can happily be run in parallel for as long as you need them to be. Meanwhile, if a language is very different from the others, you want its execution to move to a very different location as soon as possible, so that you don't have to try to make your code conform to two different syntactic models at once.

  • Look for opportunities to split one language, or a group of similar languages, away from each other. Work from larger groups down to smaller groups. Once you have a group of similar languages all at a certain point in the program, you'll need to split them up at some point. At the start of the program, you might well, say, want to split the languages that use # as a comment marker away from languages that use some other comment marker. Later on, perhaps you have a point where all languages use f(x) syntax for function calls, separate commands with semicolons, and have similar syntactic similarities. At that point, you could use something much more language-specific to split them, e.g. the fact that Ruby and Perl don't process escape sequences in '' strings, but Python and JavaScript do.

In general, the logical flow of your program should end up as a tree, repeatedly splitting into groups of languages that are more similar than each other. This puts most of the difficulty in writing the polyglot right at the start, before the first split. As the control flow branches out more and more, and the languages that are running at any given point get more and more similar, your task gets easier because you can use more advanced syntax without causing the languages involved to syntax-error.

A good example is the set {JavaScript, Ruby, Perl, Python 3}; all these languages accept function calls with parentheses and can separate statements with semicolons. They also all support an eval statement, which effectively allows you to do flow control in a portable way. (Perl is the best of these languages to split off from the group early, because it has a different syntax for variables from the others.)

user62131

Posted 2016-11-26T00:59:02.080

Reputation:

13

Hide code inside string literals

In most languages, a string literal on its own either does nothing, or does something that can be easily reversed (such as pushing the string onto the stack). String literal syntax is also relatively nonstandardised, especially for the alternative syntaxes that many languages use to handle strings with embedded newlines; for example, Python has """ ... """, Perl has q( ... ), and Lua has [[ ... ]].

There are two main uses of these. One is to allow you to interleave sections intended for different languages via starting a string at the end of one language's first section and resuming it at the start of the second: it should be fairly easy to avoid accidentally closing the string due to the variety of string delimiters among different languages. The other is that many string delimiters happen to be meaningful as a command in other languages (often more so than comment markers), so you can do something like x = [[4] ], which is a harmless assignment in languages which use JSON notation for lists, but which starts a string in Lua (and thus allows you to split the Lua code from the rest, given that it effectively "jumps" to the next ]]).

user62131

Posted 2016-11-26T00:59:02.080

Reputation:

13

Variable or code inside string literals

Double-quoted string literals are mostly harmless in many languages. But in some languages they could also contain code.

In Bash, you can use `...` (it doesn't end the program):

"`echo Hello world! >/proc/$$/fd/1`"

In Tcl, you can use [...]:

"[puts {hello world!};exit]"

In PHP, you can use ${...} (this generates an error in Bash so it must appear after the Bash code):

"${die(print(Hello.chr(32).world.chr(33)))}";

In Ruby, you can use #{...}:

"#{puts 'Hello world!';exit}"

There might be also others.

These grammars aren't compatible. That means you can put all the code of these languages in one string in a harmless location. And it will just ignore the unrecognized code in other languages and interpret them as string content.

In many cases, you could also easily comment out a double quote character there and make a more traditional polyglot.

jimmy23013

Posted 2016-11-26T00:59:02.080

Reputation: 34 042

13

Ending the program

You can end the program abruptly in one language so that it will ignore the code in another language.

So basically this format can be used

code_in_language1 end_program_in_language1 code_for_language2 end_program_in_language2 ...

where end_program_in_languageN is the command for ending the program.

For example, in my answer in What will you bring for Thanksgiving?, I ended the program in Dip, and then I wrote code for another language, V, so that the Dip interpreter would ignore it.

"turkey"e#"corn"??"gravy"p&Ssalad
"turkey"e#"corn"??"gravy"                 
                         p&            # print stack and exit program (Dip) 
                           Ssalad      # Now that the program ended in Dip,
                                       # I can write V code that would otherwise
                                       # have caused errors in Dip

But then, not all languages have a command that can end the program just like that. However, if such a language has the feature, it should be used wisely.

As @LuisMendo suggested, you can create an error (if it is allowed) to end the program if the language does not already have an "end program" builtin.

user41805

Posted 2016-11-26T00:59:02.080

Reputation: 16 320

2Even if the language doesn't have a function or statement to end the program, an error usually will do – Luis Mendo – 2016-11-26T19:54:00.607

1@LuisMendo: Agreed, although note that many polyglotting problems specifically ban exit-via-crashing because it makes things too easy. It's a good idea to exploit it when they don't, though. – None – 2016-11-28T03:23:47.723

1You should probably mention that the second part's code still should be syntactically correct in the first language or else most practical languages will throw an error. – MilkyWay90 – 2019-03-26T00:13:02.837

12

Variable Aliasing

This is probably one of the simplest yet (IMO) most important tricks to use, especially since it can reach so many languages.

Example:

print=alert;print("Hello World!")

This will work in not only Javascript, but also Python, Ruby, etc. More examples later when I think of some others. Of course, comment suggestions/post edits are welcome.

Mama Fun Roll

Posted 2016-11-26T00:59:02.080

Reputation: 7 234

5Note that when doing e.g. JS/Python, it's usually shorter to alias alert to print in Python (3 only) because JS's comment syntax, //, can be easily worked into a Python program, while Python's # can't be worked into JS. – ETHproductions – 2016-11-26T03:58:46.667

11

#-based comments

This tip is a subset of Exploit comment symbols and Blockquotes in at least one language

When creating polyglots with many languages, especially production-ready languages as opposed to esolangs, it can be useful to look at the languages which use # in block or single-line comments.

  • There are many languages with block comment syntaxes starting with #, and there's a lot of variety in the chars following the #.
  • Most of these languages also allow a single # as a line comment, which means that something which might start a block comment in one language is just an ordinary comment in another, making it easy to fit in.

Here's a quick summary list of languages which use # in a block comment (not exhaustive):

Language            Start       End      Single-line #?     Notes
------------------------------------------------------------------------------------------
Agena               #/          /#             ✓
AutoIt              #cs         #ce
Brat                #*          *#             ✓
C                   #if 0       #endif                      Not actually a comment
CoffeeScript        ###         ###            ✓            Needs to be on separate line
Common Lisp         #|          |#
Julia               #=          =#             ✓
Lily                #[          ]#             ✓
Objeck              #~          ~#             ✓
Perl 6              #`{         }#             ✓            Any bracketing chars will do
Picolisp            #{          }#             ✓
Scheme              #|          |#

For more examples, see Rosetta Code.

Here's a quick and easy example, as a demonstration:

#|
###
#`[

print("Julia")
#=

|#
(format t "Common Lisp")
#|

###
alert("CoffeeScript")
###

]#
say "Perl 6"
#`[

...

# ]# # ### # |# ; =#

Sp3000

Posted 2016-11-26T00:59:02.080

Reputation: 58 729

Zephyr has #- ... -#. – DLosc – 2016-11-28T02:19:38.400

11

Arithmetic operator discrepancies

For similar languages or simple polyglots, sometimes it's useful to look for differences in how the languages perform arithmetic. This is because most (non-esoteric) languages have infix arithmetic operators and arithmetic can be a quick and easy way to introduce a difference.

For example:

  • ^ is bitwise XOR in some languages and exponentiation in others
  • / is integer division in some languages and floating point division in others
    • For the integer division languages, -1/2 is -1 in some languages (round down) and 0 in others (round to zero)
  • -1%2 is -1 in some languages and 1 in others
  • --x is a no-op in some languages (double negation) and pre-decrement in others
  • 1/0 gives infinity in some languages and errors out in others
  • 1<<64 gives 0 in some languages (overflow) and 36893488147419103232 in others

Sp3000

Posted 2016-11-26T00:59:02.080

Reputation: 58 729

3A simple example would be x=1;["JS","Python"][--x], which returns the name of the language it's run in (between JS and Python). – ETHproductions – 2016-11-26T19:53:35.457

10

Use Brainfuck

Pretty much all BF implementations cast out chars that aren't +-<>[].,, which just so happens to work in our favor!

BF is probably one of the easiest languages to work into a polyglot because of this feature, as long as you write the BF part first. Once you have your BF code written out, it's just a matter of modeling whatever other code you have around the BF structure.

Here's a really simple example:

.+[.+]

This pretty much increments and charcode-outputs "forever" (depending on runtime settings). Now if you wanted to write a random piece of code, say, in JS, you could do:

x=>"asdf".repeat(+x)[x*Math.random()*2+1|0]

Notice how the JS is molded around the BF.

Be sure to know that this works best if you are really set on starting with BF; it's decently harder to start with another language and try to incorporate BF.

Mama Fun Roll

Posted 2016-11-26T00:59:02.080

Reputation: 7 234

6For larger polyglots where a few bytes of savings from integrating the BF doesn't help much, I'd write the BF last and wrap the other code in as many [] as necessary. – Sp3000 – 2016-11-26T04:59:28.357

6This applies not just to brainfuck but to the huge number of brainfuck-similar languages and a lot of other Turing tarpits. – 0 ' – 2016-11-26T14:25:16.957

2The first x=> changes the cell, which in this case doesn't matter, but just wanted to say – Roman Gräf – 2016-11-27T19:54:38.117

7

Use languages in which most characters don't matter

This is a generalization of Mama Fun Roll's point about BF. An esolang that ignores most characters is very useful in polyglots. Also useful: an esolang in which a large set of characters are interchangeable. Some examples:

  • Whitespace ignores everything that isn't space, tab, or newline.
  • Brain-Flak basically ignores everything besides ()[]{}<>. (@ sometimes causes an error when the interpreter tries to parse it as the start of a debug flag.)
  • oOo CODE ignores everything except letters. Furthermore, all lowercase letters are interchangeable, as are all uppercase letters.
  • Wierd only distinguishes between whitespace and non-whitespace characters.
  • In Wordy, some punctuation characters are ignored, and all letters are interchangeable.
  • Both Parenthetic and Parenthesis Hell ignore everything except parentheses.

DLosc

Posted 2016-11-26T00:59:02.080

Reputation: 21 213

I fixed that @ error. – Post Rock Garf Hunter – 2016-11-28T05:01:45.377

Try combining Whitespace with Python – enedil – 2017-07-17T12:40:55.260

@enedil You don't need to have tabs with Python. You can use exec('''...\t\n\40''') – MilkyWay90 – 2019-07-01T19:17:12.140

5

Be Aware of Nested Block Comments

Sometimes multiple languages will use the same syntax for block comments, which is more often than not a deal breaker for creating a polyglot with the two languages. Very occasionally however, one of the languages will allow nested block comments, which can be abused to create separate code paths.

For example, consider this polyglot:

#[#[]#print("Lily")#]#echo"Nim"

Nim and Lily both use #[ and ]# to begin and end block comments, but only Nim allows nested block comments.

Lily considers the second #[ as part of the singular block comment and the first ]# as terminating the block comment. (The # following Lily’s print statement is a line comment that hides Nim’s code.)

Nim alternatively, sees the #[]# as a nested (albeit empty) block comment and print("Lily")# as the outer block comment.

Chance

Posted 2016-11-26T00:59:02.080

Reputation: 5 228

4

Not sure if this counts, but...

Use a shebang line to turn everything into a valid perl program

According to this answer and the Perl documentation, if you pass any file that starts with a shebang line to perl, it invokes the appropriate program to run it. For instance, this

#!/usr/bin/python

for i in range(6):
    print i**2

gets executed by the Python interpreter if you call perl filename.py.

Federico Poloni

Posted 2016-11-26T00:59:02.080

Reputation: 151

3While the program can be called with perl, it doesn't become a Perl program. – Paŭlo Ebermann – 2016-11-27T10:56:32.663

2@PaŭloEbermann I realize that it's borderline, that's why I started my answer with "not sure if it counts". :) But what defines true Perl, if not "what is written in the documentation and returned by the reference implementation perl"? Sounds like a good philosoraptor meme... – Federico Poloni – 2016-11-27T13:25:48.053

1

(See also this meta answer.)

– Federico Poloni – 2017-11-25T18:15:10.770

4

Call nonexistent functions, then exit while evaluating their arguments

Many programming languages are capable of parsing an arbitrary identifier followed by a pair of parentheses with an expressions inside:

identifier(1 + 1)

Sometimes, the form of the identifier in question might be fixed, due to being needed to give code to a different language you're using. That might at first seem to cause trouble, if the identifier doesn't correspond to a function that the language actually has.

However, many programming languages will evaluate a function's arguments before they check to see if the function itself actually exists (e.g. Lua), and so you can use this sort of construct anyway; all you need is to exit the program somewhere inside the function's arguments.

Here's an example, a dc/Lua polyglot:

c2pq(1 + #os.exit(print(3)))

c2pq is a dc program to print 2 and exit; Lua sees this as the name of a function, but Lua can be prevented from erroring via placing an exit command in its argument. The big advantage of this construction is that unlike an assignment (c2pq =), it's not automatically incompatible with languages in which variable names start with a sigil; function name syntax is much more consistent across languages than variable name syntax is.

user62131

Posted 2016-11-26T00:59:02.080

Reputation: