Comment-free polyglot with different, nontrivial, behaviour per language

7

This challenge is inspired by this other challenge, which you may want to read first for some context. However, a competitive answer is likely to look very different, so after reading some posts on Meta I decided it was best posted as a separate question.

The goal here is to write a program; that is, a program that works in multiple languages. However, typically polyglot challenges are answered via some standard techniques (listed below). I think it's much more interesting if the standard techniques are avoided, and thus this challenge has a number of restrictions that your program must obey; each restriction is designed to close a standard loophole (although if you can somehow exploit the loophole while remaining within the restriction, go ahead). Here are the restrictions, and the reasons behind them:

  1. Deleting any non-whitespace character from your program must cause it to fail to function in all your languages. Your program may not read its own source.

    One of the most common tricks for creating polyglots is to hide code intended for one language inside syntax that's a comment block in each other language. In order to define "comments" (and other non-coding syntax, which similarly needs to be disallowed) in a language-independent way, we define a character to be part of a comment if it could be deleted without affecting the operation of the program. (This definition was based on that in nneonneo's challenge, linked above; not counting the removal of whitespace isn't quite loophole-free as some languages have particularly significant whitespace, but in general it's hard to find a lot of languages which use it for computation, and I don't want the focus of the challenge to be on minor details about which languages require whitespace where.)

    To clarify "failing to function", the intention is that the program works like this: deleting a character from the program causes it to break at least one of the other restrictions (e.g. it errors out, or prints the output that does nor comply with the specification).

    It's fairly clear that a program could work around the spirit of this restriction via examining its own source code (e.g. via opening the file that stores it on disk, or using a command similar to Befunge's g); as such, doing that is banned too. To be precise, you may not use language features that would act differently if a comment were added to the source code.

  2. The program may not error out, either at run time or (if applicable) at compile time.

    A common trick with polyglots is to not worry about executing parts of the program intended for other languages that aren't valid in one of the languages; you just write them anyway, then let the program crash. Banning errors closes the loophole. Warning messages are acceptable, unless the language is one that downgrades situations that would be errors in most typical languages (e.g. invalid syntax) as warnings.

  3. The program must not use any language features that return the name or version of the language in use.

    For example, no using $] in a submission that's intended to work in Perl. A common trick when answering polyglot questions is to take two languages which are very similar (such as Python 2 and Python 3) and claim that a program is a polyglot in both of them; however, if the languages are that similar, it makes the programs much easier to write and is uninteresting in a way. This restriction requires any solution to the problem that uses similar languages to focus on the differences between them, rather than the similarities.

  4. When executed, the program must: print the name of the language it is executing in; print the name (e.g. "A000290") of a sequence from OEIS; then print the sequence itself. The sequence must be different for each language.

    One final trick for writing polyglots is to write a very simple program that uses only code constructs which have the same meaning in a wide range of languages, such as print(1) (which is a complete program that outputs 1 in sufficiently many languages that it or something like it would probably win without this loophole being closed). As such, there's a requirement here for the program to do a nontrivial amount of work. The program must calculate the sequence elements, not just produce hardcoded output (i.e. do not exploit this standard loophole). If the sequence you choose is an infinite one, this means that the program should run forever producing more and more terms.

    There's a large enough variety of sequences in OEIS that you should be able to find something that's relatively easy to implement in each of the languages you use, so long as your construction can handle some nontrivial code in addition to just printing constant strings.

    This requirement also closes another standard loophole for implementing polyglots: observing that a file is missing a marker such as <?php and thus echoes itself, and claiming that that's a valid program. (Deleting a character would definitely change the result then!) Forcing the program to calculate something prevents that.

    Note: this isn't really a challenge about implementing bignums, so it's OK if the output becomes incorrect after a while or even if the program errors out due to things like word size concerns, so long as you don't try to abuse this by picking a language with very small integers. Likewise, there aren't any hard requirements on the output format. You can output in unary for all I care.

    Incidentally, the requirement to print the sequence number is to prevent a solution accidentally becoming invalid because deleting a character changes the sequence that's printed without changing anything else (and therefore is there to make the first restriction easier to comply with).

The winner will be the answer that works in the most languages, making this a . If there's a tie, the tiebreak is to favour the submission that was posted earlier. Unlike many challenges on this site, there's no benefit to writing a short program, so feel free to make it as readable as the restrictions allow (which, admittedly, is probably still not very readable). Good luck!

user62131

Posted 2016-11-20T03:51:15.220

Reputation:

What's the tiebreaker if multiple submissions have the same number of languages? – DLosc – 2016-11-20T04:15:03.713

I was intending to just leave it as a tie, because I feel that optimizing for a tiebreak would detract from optimizing for the main goal. ([tag:code-golf] challenges can often be tied with an equal byte count.) Is that not allowed? – None – 2016-11-20T04:49:32.020

7Could the people who put this on hold please clarify what it is that they find unclear about the problem? I wrote three screenfuls of explanation because I wanted to be as clear as possible! – None – 2016-11-20T04:50:25.663

Clarifications about why this was put on hold from chat: no tiebreak, a few of the criteria weren't quite objective, issues with whitespace-specific languages. I've edited the post to try to resolve them. – None – 2016-11-20T05:11:56.523

The first rule may be better as "deleting any character from the source code will cause it to fail in at least one language". I'm tired, though, so that variation may be much worse and I'm not seeing it. – Mego – 2016-11-20T05:57:46.557

That allows you to write something that's a comment in all but one language, and not a comment in the remaining language, i.e. the normal technique for creating polyglots. – None – 2016-11-20T06:38:50.410

One loophole I can think of is to use strings instead of comments and taking their checksum (or even length) to prevent the code from running if a character is deleted. – Martin Ender – 2016-11-20T08:44:04.250

I don't think that can really be considered a loophole, given that if you're checksumming the contents of the string, it is by definition actually relevant to how the program operates. Besides, I'm not sure there's a good place to draw the dividing line; and you'll need to use this sort of method to hide at least the names of the other languages. (I think something like what you suggested will probably be used by the winning submission; it's something that I've considered for my own answer.) – None – 2016-11-20T08:56:26.220

3

For future reference, we have a challenge sandbox where you can get feedback for challenge ideas and sort out the gritty details of the challenge spec with the help of the community before it goes live. That can be especially useful for tricky specs like this one. (I think you've done a solid job though, especially since it's your first challenge. :))

– Martin Ender – 2016-11-20T10:13:21.523

Answers

12

Perl + Lua + JavaScript (Rhino) + Ruby + Python (3), 5 languages

m=[[']|^$=;eval($n=q@sub t {(chr 39)=~//&&"["=~//&&0!~//||die; 690 == length $n.$_[0][0][0] and print "Perl: A004442$/1$/"; print 1^++$.,$/ while 1;}@);t([ [q 9)',
    ']] n=[[function t(p) if #(m..n..p)-1==688 then print("Lua: A000045") a=1 b=1 while true do print(a) print(b) a=a+b b=b+a end end end]] loadstring(n)() t([['],
   '\x3044',
   'if((""+m).length == 640) print("JavaScript: A000217"); for(i=1;;i++)print(i*(i+1)/2)',
   'm.inspect.length == 700 and print("Ruby: A000027\n"); $i=0; print(++$i, "\n") while $i=$i+1',
   'exec(m[5])',
   'print("Python: A000012" if len(repr(m)) == 676 else ""); exec(m[6])',
   'while True: print(1)',
   'eval(m[eval(m[1][2])])',
   'eval(m[([22]+[7])[1] ])',
   8];eval(m[m[9]])

I used Rhino, an offline (non-browser-dependent) JavaScript implementation, because it makes local testing easier; it produces output using print instead of alert. You can change each occurrence of print to alert in the fourth line to get a program that will work in a browser.

This program uses the technique of storing most of the strings used for the various languages in data structures and evaluating them. We can determine if a character was deleted via checking the size of the resulting structure. That might sound easy, but it's surprisingly hard to ensure that for each language, every character of the program is either inside a string somewhere or else will break the syntax if deleted.

The program works in three different ways, depending on what language is in use.

Lua

The Lua interpretation of the program is one of the simplest and shows off the general techniques in use, so it's a good place to start. Because [[...]] is a string literal in Lua, the program parses like this:

m=[[...]] n=[[...]] loadstring(n)() t([[...]])

Here, m and the argument to t() are used to capture the parts of the program used by "other languages" in order to check them for modification; n contains the majority of the Lua code (it defines a function t when evaluated via loadstring). It should be fairly obvious that every byte that isn't inside a string literal will break the program if deleted; deleting one of =()[], the first use of m and n, or the final t will cause a syntax error; and deleting the argument to loadstring, or corrupting the name of the function itself, will make it impossible for the program to do anything useful as the content of n is never run. (Note that the trick of defining and then calling a function allows us to read the rest of the program "from the left", meaning that we don't need to write any Lua code physically to the right of the final ]]) in order to continue executing Lua after the entire program has been read.)

The implementation of t is fairly simple; it checks to ensure that m, n and p have the expected total length, then prints out the required information (the language's name, sequence number, and the Fibonacci sequence). One slight subtlety here is that I subtract 1 from the length and compare it to 688, rather than comparing to 689 directly; this is to avoid putting a literal 9 digit in the source code (the reason for this will become clear later).

JavaScript, Ruby, Python

All three of these languages parse the program the same way: as a definition of a nested data structure m, followed by eval(m[m[9]]). The definition of m looks like this:

m=[['...','...'],'...','...','...','...','...','...','...','...',8];

In other words, m is a nested list here, and we can index it to get at various strings within the list. The eval(m[m[9]]) check ensures that m has the expected number of elements; the program will therefore do nothing useful if one of the commas gets deleted (other than the first), even if the language can parse it. This ensures that all the strings from m[1] onwards are either intact or have one character deleted. We can observe that deleting any character outside a string literal here will break the program; without the initial m or an intact eval it can't run anything, without the = or ; or a parenthesis or square bracket it won't parse, deleting an apostrophe will cause an unterminated string (all the apostrophes in the program are used as string delimiters and none are escaped), and it won't find the right string to run without the final ms and 9 intact.

Assuming that everything is fine except for possibly the contents of the strings inside m, we then try to determine which entry to run. This uses fragments of code that are written in the common subset of the languages' syntaxes, but have different meanings in the languages in question. To distinguish between JavaScript and the other languages, we run ([22]+[7])[1]; this collapses to [22, 7][1] (i.e. 7) in Ruby and Python, but "227"[1] (i.e. "2", which JavaScript is happy to treat as 2 for the purpose of list indexing) in JavaScript. Then to distinguish between Ruby and Python, we take the third character of the string '\x3044'; Python expands \x escapes in both '' and "" strings, thus seeing this as 4, whereas Ruby does not expand escapes in strings if they're delimited using '', and thus sees the third character as 3. (We convert these strings to integers in a portable way by using eval yet again. It's not a recommended method of converting a string to an integer, but it does work in more languages than most other methods do.)

This means that a different string gets executed in each language, so we can start using language-specific syntax. All three programs basically do the same thing; they convert the entire data structure m to a string, measure its length, and compare it to an expected value. If it's wrong, they refuse to print the language name and sequence number, thus failing to comply with the spec (as required). Then they print an appropriate sequence. Ruby prints the consecutive integers, whereas JavaScript prints the triangular numbers (this is mostly because I'm more experienced with Java than I am with Ruby). Python is a bit more complex, because eval in Python only runs expressions, and you need exec for statements; additionally, Python isn't hugely fond of semicolons, and thus I needed to split the Python program over two execs to make it work. Because I'm not very experienced at Python one-liners, I chose to print the sequence of all-1s; boring, I know, but based on the tiebreak we've selected I wanted to get this submitted.

Perl

Saving the best for last. People who have any experience with Perl will know that variable names have to start with a punctuation mark (with $ being by far the most common; other names are used for special cases like lists and dictionaries). Some of the languages I'm using don't like dollars in variable names (Ruby is OK with it, but one other language besides Perl isn't so useful…), which means that I can't start the program with a variable assignment. What Perl does have is a fairly wide range of interesting quoting operators, so it would be trivial to hide the start of the program from Perl by capturing it in a string literal with q= ... = (which is one of the many, many ways to write a string in Perl). However, hiding the text from Perl is something we can't do; in order to make the quine comment-free, we need to be able to see everything!

The obvious approach is to read the string entirely from the right, but I couldn't think of a way to do that while writing the program, so I had to take a different approach. It's possible to match a regex against $_ (Perl's "variable used by default") via writing the regex as an expression by itself; something like /:(.):/ would look for a character between two colons in $_ and store it in $1. We can change the quote marks used for regexes by preceding the first with m; thus, as our program starts with m=[[']|^$=, Perl will parse this as /[[']|^$/. Matching that regex against $_ might seem fairly useless; however, Perl has a convenience feature that allows the use of the notation // to repeat the last successfully matched regex. It matches ($_ is the empty string, thus it matches the ^$ branch of the regex), meaning that we can determine if any characters got deleted from the regex by testing it on various strings to see if it matches! Some characters being deleted from the regex (such as the ]) will cause a parse failure. For the others, we match it against chr 39 (i.e. an apostrophe, written as a character code so as not to alarm Python), against "[", and against 0 (which Perl stringifies to "0", expecting two matches and a non-match:

  • If the | is deleted, the original regex won't match, and thus the // will match anything.
  • Conversely, if the ^ or $ is deleted, the original regex will match, but it still matches anything (as all strings have a start and and end, just not necessarily both together).
  • If either [ is deleted, the regex won't match "[".
  • If the ' is deleted, the regex won't match '.

Apart from the regex experimentation, everything works much the same way as in Lua. The program is arranged slightly differently:

m= ... =;eval($n=q@ ... @);t([ [q 9 ... 9]])

but it still comes down to basically the same thing (just with weirder quote marks; q can make basically anything into a quote mark, including in this case the digit 9, which is why we couldn't use it in the rest of the program). Likewise, deleting any of these characters will break the parse for the same reason it does in the corresponding Lua. The code that's assigned to $n (and evaluated) defines t to ensure that the regex is intact, then check to ensure that $n and its own argument are intact, then print the required output (I chose sequence A004442 this time, because I was feeling a bit quirky and creative). Note that we can safely use a 9 in 690, the expected total lengths of $n and $_[0][0][0] ("the first element of the first element of the first argument"), as it appears to the left of the q 9 open-quote in the program.

Extensions?

I think this program is close to its limit in terms of languages. You'd either need to find a new creative way to quote strings that doesn't clash with the existing constructions, or else to find another language with syntax close enough to JavaScript/Ruby/Python that the define-lists-and-eval trick works. (I tried looking into esolangs, but they tend to be worse with respect to string literals than more exoteric languages do, and the rules against reading the source make it hard to cheat with them.)

user62131

Posted 2016-11-20T03:51:15.220

Reputation:

And after writing all that program and all that explanation, I realised there's a much easier way to write the Perl; you can read a string from the right by regexing against it using a regex that captures the entire string. Oh well, this way works too and it's more interesting. – None – 2016-11-21T08:01:07.153