65
6
The scenario
Lately you have been noticing some strange behavior with your favorite text editor. At first it seemed that it was ignoring random characters in your code when writing to disk. After a while you noticed a pattern; characters with odd ASCII values were being ignored. Under further inspection you discovered that you can only write to files properly if every eighth bit is zero. Now you need to know if your valuable files have been affected by this strange bug.
The task
You must write a complete program that determines if a file contains any odd bytes (demonstrating it is uncorrupted). But because of your text editor you cannot write any odd bytes in your source code. You may assume any pre-existing encoding for input, however you must still check every individual byte, not just characters.
Input
Your program will take the contents of or the path to a file from either stdin or command line.
Output
Your program will output to stdout either a truthy value if the given file contains an odd byte or a falsy if every eighth bit is zero.
Criteria
This is code golf, shortest program that completes the task wins. To be a valid submission every eighth bit in the files source code must be a zero. I would recommend including a copy of your source code's binaries in your submission.
Standard loopholes apply.
Test Cases
(In ASCII encoding) Input:
"$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtvxz|~
Output:
falsy
Input:
!#%')+-/13579;=?ACEGIKMOQSUWY[]_acegikmoqsuwy{}
Output:
truthy
Input:
LOREMIPSVMDOLORSITAMETCONSECTETVRADIPISCINGELITSEDDOEIVSMODTEMPORINCIDIDVNTVTLABOREETDOLOREMAGNAALIQVA
VTENIMADMINIMVENIAMQVISNOSTRVDEXERCITATIONVLLAMCOLABORISNISIVTALIQVIPEXEACOMMODOCONSEQVAT
DVISAVTEIRVREDOLORINREPREHENDERITINVOLVPTATEVELITESSECILLVMDOLOREEVFVGIATNVLLAPARIATVR
EXCEPTEVRSINTOCCAECATCVPIDATATNONPROIDENTSVNTINCVLPAQVIOFFICIADESERVNTMOLLITANIMIDESTLABORVM
Output:
truthy
Tips
Choose language wisely this challenge might not be possible in every language
The Unix command
xxd -b <file name>
will print the binaries of a file to the console (along with some extra formatting stuff)You may use other encodings other than ASCII such as UTF-8 as long as all other rules are followed
Can we assume that the input doesn't contain newlines? – Dennis – 2016-08-04T23:39:50.963
@Dennis No you cannot, you should assume that you are reading from a random text file. – Post Rock Garf Hunter – 2016-08-04T23:41:15.247
1@Dennis If you don't mind me asking, why do you ask? If this creates unnecessary difficulty for some reason I may consider amending the requirements. – Post Rock Garf Hunter – 2016-08-04T23:43:27.090
2Some languages have a hard time reading multi-line input, but it's not like this challenge is meant to be easy, so it's probably OK. :P Can the input be empty? – Dennis – 2016-08-04T23:45:35.760
@Dennis it is safe to assume that the input cannot be empty. An empty text file is hardly considered valuable. – Post Rock Garf Hunter – 2016-08-04T23:46:52.483
9
!#%')+-/13579;=?ACEGIKMOQSUWY[]_acegikmoqsuwy{}
are the banned printable ASCII characters, for anyone who cares. The allowed printable ASCII characters are" $&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtvxz|~
– Patrick Roberts – 2016-08-04T23:48:33.140My function is encoded in the classic Dyalog encoding so that I can have 1 byte/char. But it analyzes the input according to evenness of UCS point. Is this OK? – Adám – 2016-08-04T23:55:28.577
@Adám I am sorry I do not know what you mean by UCS point. – Post Rock Garf Hunter – 2016-08-05T00:05:00.627
1@EamonOlive In Unicode, every character is assigned an integer, called a code point. They are roughly the same concept as ordinals in other encoding schemes (like ASCII), but can take a much larger range of values (specifically, in UCS-2, code points range from 0-65535). – Mego – 2016-08-05T00:08:42.080
1@Adám No If I am understanding correctly I do not believe that this is ok. My understanding is since UCS-2 encodes each character as two bytes you only end up checking every other byte for evenness. Programs should check every 8th bit looking for a one. – Post Rock Garf Hunter – 2016-08-05T00:14:22.847
9Quite handy that all vowels are banned... ;-) – owacoder – 2016-08-05T00:46:44.960
@EamonOlive I did not mean UCS - 2, only the Universal Character Set. In other words: My code is encoded in a specific encoding that is fitting for it, and in that encoding it only uses even-positioned characters. However, when it is fed ASCII text to check, it uses the position in the Universal Character Set to determine evenness. – Adám – 2016-08-05T01:31:25.737
@EamonOlive The real problem here is that if I am to judge evenness of input according to the same encoding as the code itself, then I cannot support normal 7-bit ASCII files, because only 111 of the 128 7-bit ASCII characters are in my 256 character code page. – Adám – 2016-08-05T01:40:32.850
1@Adám In that case it should be fine. – Post Rock Garf Hunter – 2016-08-05T01:40:35.163
determines if a file is contains any odd bytes and is thus uncorrupted – Shouldn't that be "corrupted"? – Adám – 2016-08-05T06:35:44.827
How is
LOREMIPSUM...
truthy? It contains both even and odd bytes (unless I just don't understand the challenge) – shooqie – 2016-08-05T08:40:24.907@Adám I think the motivation is that if a file contains odd bytes, then it must not have been saved by the broken editor (which would have removed those bytes), so it must have been saved by some other working editor, hence “uncorrupted”. – Anders Kaseorg – 2016-08-05T10:49:18.640
@AndersKaseorg Ah, I see. Thanks. – Adám – 2016-08-05T11:14:21.887
May we take input bytes bytes in numeric form? – Adám – 2016-08-05T11:53:28.280
@Adám you may not take bytes in numeric form. – Post Rock Garf Hunter – 2016-08-05T12:32:24.363
1@shooqie
LOREMIPSUM...
is truthy because it is "non corrupted". It contains odd bytes and thus has not been damaged by the bad text editor. Falsy is reserved for files that only contain even bytes, thus have been damaged. – Post Rock Garf Hunter – 2016-08-05T12:35:23.6904Welp, so much for BF having a chance in this challenge. – TLW – 2016-08-05T21:36:28.953
2Also note that if you have any line breaks in a DOS/Windows file, the
[CR]
has the odd bit. I was hoping that WhiteSpace was safe, but alas[TAB]
. If you want to go old school, EBCDIC gives you three vowels. – GuitarPicker – 2016-08-08T23:56:21.137Can we assume the input will not contain
NUL
(\0
) bytes? – Dom Hastings – 2017-08-17T16:22:22.900@DomHastings Sure. That seems reasonable to me. – Post Rock Garf Hunter – 2017-08-17T16:23:09.267
I think we need clarification on exactly which bytes we do need to support, if the program itself is plain ASCII. – Ørjan Johansen – 2017-08-17T16:32:24.370
@ØrjanJohansen You should support all the bytes other than the null byte which I have made an exception for. – Post Rock Garf Hunter – 2017-08-17T16:33:29.333
@WheatWizard https://codegolf.stackexchange.com/a/155007/59376 - what is the validity of this answer lol.
– Magic Octopus Urn – 2018-02-06T18:57:38.713