$@='NOT ';print"$@CORRUPTED"__DATA__ =®®”print"$@CORRUPTED"__DATA__ =®®”Ê®›~
This program contains a few stray octets that aren't valid UTF-8. As such, it's shown as it looks in Windows-1252. (By default, if A Pear Tree sees a non-ASCII octet in a string literal or the like it treats it as an opaque object and doesn't try to understand it beyond being aware of what its character code is; this behaviour can be changed via an encoding declaration but the program doesn't have one. So the program is logically in "unspecified ASCII-compatible character set". All the non-ASCII octets are in comments anyway, so it doesn't really matter.)
Explanation
A Pear Tree checksums the program, looking for the longest substring that has a CRC-32 of 00000000
. (If there's a tie, it picks the octetbetically first.) Then the program gets rotated to put it at the start. Finally, the program gets interpreted as a language that's almost a superset of Perl, defining a few things that are undefined in Perl to work the same way as in Python (and with a few minor changes, e.g. print
prints a final newline in A Pear Tree but not in Perl). This mechanism (and the language as a whole) was designed for polyglot and radiation-hardening problems; this isn't the former, but it's definitely the latter.
In this program, we have two notable substrings that CRC-32 to 00000000
; the entire program does, and so does print"$@CORRUPTED"__DATA__ =®®
by itself (which appears twice). As such, if the program is uncorrupted, it'll set $@
to NOT
and then print it followed by CORRUPTED
. If the program is corrupted, then the CRC-32 of the program as a whole will fail to match, but one of the shorter sections will remain uncorrupted. Whichever one is rotated to the start of the program will just print CORRUPTED
, as $@
will be the null string.
Once the string's been printed, __DATA__
is used to prevent the rest of the program running. (It crosses my mind writing this that __END__
could be used instead, which would clearly save two bytes. But I may as well post this version now, because I've spent a bunch of time verifying it, and a modified version would have to be re-verified due to the CRC changes; and I haven't put a huge amount of effort into golfing the "payload" yet, so I want to see if anyone has other improvements in the comments that I can incorporate at the same time. Note that #
doesn't work in the situation where a character is corrupted into a newline.)
You might be wondering how I managed to control the CRC-32 of my code in the first place. This is a fairly simple mathematical trick based on the way that CRC-32 is defined: you take the CRC-32 of the code, write it in little-endian order (the reverse of the byte order that's normally used by CRC-32 calculation programs), and XOR with 9D 0A D9 6D
. Then you append that to the program, and you'll have a program with a CRC-32 of 0. (As the simplest example possible, the null string has a CRC-32 of 0, thus 9D 0A D9 6D
also has a CRC-32 of 0.)
Verification
A Pear Tree can handle most sorts of mutations, but I'm assuming "changed" means "replaced with an arbitrary octet". It's theoretically possible (although unlikely in a program this short) that there could be a hash collision somewhere that lead to the wrong program running, so I had to check via brute force that all possible octet substitutions would leave the program working correctly. Here's the verification script (written in Perl) that I used:
use 5.010;
use IPC::Run qw/run/;
use warnings;
use strict;
undef $/;
$| = 1;
my $program = <>;
for my $x (0 .. (length $program - 1)) {
for my $a (0 .. 255) {
print "$x $a \r";
my $p = $program;
substr $p, $x, 1, chr $a;
$p eq $program and next;
alarm 4;
run [$^X, '-M5.010', 'apeartree.pl'], '<', \$p, '>', \my $out, '2>', \my $err;
if ($out ne "CORRUPTED\n") {
print "Failed mutating $x to $a\n";
print "Output: {{{\n$out}}}\n";
print "Errors: {{{\n$err}}}\n";
exit;
}
}
}
say "All OK! ";
[tag:radiation-hardening] has a host of similar questions, but I haven't found any that are quite like this one. – FryAmTheEggman – 2017-01-22T02:13:01.827
7To the downvoters: I suspect this is possible (if very difficult), if you pick the right language. Please don't close or delete the question unless you think there's something wrong with it other than being impossible. – None – 2017-01-22T02:20:52.983
7What does changed mean? Substituted by another byte? – Dennis – 2017-01-22T03:18:01.717
4@ais523 FWIW I downvoted the challenge because it seems hastily written, not because I think it's too difficult. – Dennis – 2017-01-22T05:07:26.943
@Dennis sandboxed! – noɥʇʎԀʎzɐɹƆ – 2017-01-22T14:56:53.377
Ignoring for a moment that the challenge you sandboxed is substantially different from the challenge you posted here, sandbox or no sandbox makes no difference. I'm sure you could find ways to make the challenge clearer by writing more than three sentences, just as you could have made your comment clearer by writing more than one word. – Dennis – 2017-01-22T17:06:07.940
@Dennis what is unclear? – noɥʇʎԀʎzɐɹƆ – 2017-01-22T17:53:38.980
5It's not that anything is unclear, but it could be made clearer. You could clarify if a full program is required, add an example program and illustrate all possible modifications, mention how single-byte substitutions would affect a UTF-8 encoded file, add a script that can be used to test submissions, mention that the program should not take input, etc. – Dennis – 2017-01-22T18:03:14.783