Only Even Bytes

65

6

The scenario

Lately you have been noticing some strange behavior with your favorite text editor. At first it seemed that it was ignoring random characters in your code when writing to disk. After a while you noticed a pattern; characters with odd ASCII values were being ignored. Under further inspection you discovered that you can only write to files properly if every eighth bit is zero. Now you need to know if your valuable files have been affected by this strange bug.

The task

You must write a complete program that determines if a file contains any odd bytes (demonstrating it is uncorrupted). But because of your text editor you cannot write any odd bytes in your source code. You may assume any pre-existing encoding for input, however you must still check every individual byte, not just characters.

Input

Your program will take the contents of or the path to a file from either stdin or command line.

Output

Your program will output to stdout either a truthy value if the given file contains an odd byte or a falsy if every eighth bit is zero.

Criteria

This is code golf, shortest program that completes the task wins. To be a valid submission every eighth bit in the files source code must be a zero. I would recommend including a copy of your source code's binaries in your submission.

Standard loopholes apply.

Test Cases

(In ASCII encoding) Input:

"$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtvxz|~

Output:
falsy

Input:
!#%')+-/13579;=?ACEGIKMOQSUWY[]_acegikmoqsuwy{}

Output:
truthy

Input:
LOREMIPSVMDOLORSITAMETCONSECTETVRADIPISCINGELITSEDDOEIVSMODTEMPORINCIDIDVNTVTLABOREETDOLOREMAGNAALIQVA
VTENIMADMINIMVENIAMQVISNOSTRVDEXERCITATIONVLLAMCOLABORISNISIVTALIQVIPEXEACOMMODOCONSEQVAT
DVISAVTEIRVREDOLORINREPREHENDERITINVOLVPTATEVELITESSECILLVMDOLOREEVFVGIATNVLLAPARIATVR
EXCEPTEVRSINTOCCAECATCVPIDATATNONPROIDENTSVNTINCVLPAQVIOFFICIADESERVNTMOLLITANIMIDESTLABORVM

Output:
truthy

Tips

  • Choose language wisely this challenge might not be possible in every language

  • The Unix command xxd -b <file name> will print the binaries of a file to the console (along with some extra formatting stuff)

  • You may use other encodings other than ASCII such as UTF-8 as long as all other rules are followed

Post Rock Garf Hunter

Posted 2016-08-04T22:54:49.263

Reputation: 55 382

Can we assume that the input doesn't contain newlines? – Dennis – 2016-08-04T23:39:50.963

@Dennis No you cannot, you should assume that you are reading from a random text file. – Post Rock Garf Hunter – 2016-08-04T23:41:15.247

1@Dennis If you don't mind me asking, why do you ask? If this creates unnecessary difficulty for some reason I may consider amending the requirements. – Post Rock Garf Hunter – 2016-08-04T23:43:27.090

2Some languages have a hard time reading multi-line input, but it's not like this challenge is meant to be easy, so it's probably OK. :P Can the input be empty? – Dennis – 2016-08-04T23:45:35.760

@Dennis it is safe to assume that the input cannot be empty. An empty text file is hardly considered valuable. – Post Rock Garf Hunter – 2016-08-04T23:46:52.483

9!#%')+-/13579;=?ACEGIKMOQSUWY[]_acegikmoqsuwy{} are the banned printable ASCII characters, for anyone who cares. The allowed printable ASCII characters are " $&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtvxz|~ – Patrick Roberts – 2016-08-04T23:48:33.140

My function is encoded in the classic Dyalog encoding so that I can have 1 byte/char. But it analyzes the input according to evenness of UCS point. Is this OK? – Adám – 2016-08-04T23:55:28.577

@Adám I am sorry I do not know what you mean by UCS point. – Post Rock Garf Hunter – 2016-08-05T00:05:00.627

1@EamonOlive In Unicode, every character is assigned an integer, called a code point. They are roughly the same concept as ordinals in other encoding schemes (like ASCII), but can take a much larger range of values (specifically, in UCS-2, code points range from 0-65535). – Mego – 2016-08-05T00:08:42.080

1@Adám No If I am understanding correctly I do not believe that this is ok. My understanding is since UCS-2 encodes each character as two bytes you only end up checking every other byte for evenness. Programs should check every 8th bit looking for a one. – Post Rock Garf Hunter – 2016-08-05T00:14:22.847

9Quite handy that all vowels are banned... ;-) – owacoder – 2016-08-05T00:46:44.960

@EamonOlive I did not mean UCS - 2, only the Universal Character Set. In other words: My code is encoded in a specific encoding that is fitting for it, and in that encoding it only uses even-positioned characters. However, when it is fed ASCII text to check, it uses the position in the Universal Character Set to determine evenness. – Adám – 2016-08-05T01:31:25.737

@EamonOlive The real problem here is that if I am to judge evenness of input according to the same encoding as the code itself, then I cannot support normal 7-bit ASCII files, because only 111 of the 128 7-bit ASCII characters are in my 256 character code page. – Adám – 2016-08-05T01:40:32.850

1@Adám In that case it should be fine. – Post Rock Garf Hunter – 2016-08-05T01:40:35.163

determines if a file is contains any odd bytes and is thus uncorrupted – Shouldn't that be "corrupted"? – Adám – 2016-08-05T06:35:44.827

How is LOREMIPSUM... truthy? It contains both even and odd bytes (unless I just don't understand the challenge) – shooqie – 2016-08-05T08:40:24.907

@Adám I think the motivation is that if a file contains odd bytes, then it must not have been saved by the broken editor (which would have removed those bytes), so it must have been saved by some other working editor, hence “uncorrupted”. – Anders Kaseorg – 2016-08-05T10:49:18.640

@AndersKaseorg Ah, I see. Thanks. – Adám – 2016-08-05T11:14:21.887

May we take input bytes bytes in numeric form? – Adám – 2016-08-05T11:53:28.280

@Adám you may not take bytes in numeric form. – Post Rock Garf Hunter – 2016-08-05T12:32:24.363

1@shooqie LOREMIPSUM... is truthy because it is "non corrupted". It contains odd bytes and thus has not been damaged by the bad text editor. Falsy is reserved for files that only contain even bytes, thus have been damaged. – Post Rock Garf Hunter – 2016-08-05T12:35:23.690

4Welp, so much for BF having a chance in this challenge. – TLW – 2016-08-05T21:36:28.953

2Also note that if you have any line breaks in a DOS/Windows file, the [CR] has the odd bit. I was hoping that WhiteSpace was safe, but alas [TAB]. If you want to go old school, EBCDIC gives you three vowels. – GuitarPicker – 2016-08-08T23:56:21.137

Can we assume the input will not contain NUL (\0) bytes? – Dom Hastings – 2017-08-17T16:22:22.900

@DomHastings Sure. That seems reasonable to me. – Post Rock Garf Hunter – 2017-08-17T16:23:09.267

I think we need clarification on exactly which bytes we do need to support, if the program itself is plain ASCII. – Ørjan Johansen – 2017-08-17T16:32:24.370

@ØrjanJohansen You should support all the bytes other than the null byte which I have made an exception for. – Post Rock Garf Hunter – 2017-08-17T16:33:29.333

@WheatWizard https://codegolf.stackexchange.com/a/155007/59376 - what is the validity of this answer lol.

– Magic Octopus Urn – 2018-02-06T18:57:38.713

Answers

26

GS2, 4 bytes

dΦ("

Try it online!

Hexdump

0000000: 64 e8 28 22                                      d.("

How it works

      (implicit) Read all input and push it on the stack.
 Φ    Map the previous token over all characters in the string:
d       Even; push 1 for even characters, 0 for odd ones.
  (   Take the minimum of the resulting list of Booleans.
   "  Negate the minimum.

Dennis

Posted 2016-08-04T22:54:49.263

Reputation: 196 637

21

Befunge, 36 bytes

I know this is an old question, but I wanted to give it a try because I thought it would be an interesting challenge in Befunge.

>~:0`|
>20`:>$.@
|` " "<
*8*82<^p24*

Try it online!

It outputs 1 if the input is corrupted (i.e. contains an odd byte), and 0 if it's OK.

Explanation

The problem is how to determine odd bytes without having access to the / (divide) or % (modulo) commands. The solution was to multiply the value by 128 (the sequence 28*8**), then write that result into the playfield. On a strictly standard interpreter, playfield cells are signed 8 bit values, so an odd number multiplied by 128 becomes truncated to -1 while an even number becomes 0.

The other trick was in reading the -1 or 0 back from the playfield without having access to the g (get) command. The workaround for this was to write the value into the middle of an existing string sequence (" "), then execute that sequence to push the enclosed value onto the stack. At that point, determining the oddness of the byte is a simple less-than-zero test.

One final aspect worth discussing is the output. In the false case, we reach the >$. sequence with just one value on the stack, so $ clears the stack making the . output a zero. In the true case, we follow the path 20`:>$.. Since two is greater than zero, the comparison pushes a one onto the stack, and the : makes a duplicate copy so the $ won't drop it before it gets output.

James Holderness

Posted 2016-08-04T22:54:49.263

Reputation: 8 298

1This may be late and new but it is already my favorite answer. – Post Rock Garf Hunter – 2016-12-01T03:53:51.723

@WheatWizard I only just realised now why this answer has been getting so much attention. Thank you for the bounty! – James Holderness – 2016-12-07T23:27:55.553

12

CJam (11 bytes)

"r2":(~f&2b

Online demo

Stripping away the tricks to avoid odd bytes, this reduces to

q1f&2b

which reads the input, maps a bitwise AND with 1, and then performs a base conversion, giving zero iff all of the ANDs were zero.

Peter Taylor

Posted 2016-08-04T22:54:49.263

Reputation: 41 901

3This code is sad :( – betseg – 2016-12-01T06:49:17.947

Because it can only have half of the chars @betseg – Roman Gräf – 2016-12-07T11:24:53.730

9

MATL, 7 bytes

l$Z$2\z

The source code uses UTF-8 encoding. So the source bytes are (in decimal)

108    36    90    36    50    92   122

The input is a file name, taken as a string enclosed in single quotes. The output is the number of odd bytes in the file, which is truthy iff nonzero.

Explanation

l    % Push a 1. We use `l` instead of `1` to have an even value
$    % Input specificication. This indicates that the next function takes 1 input
Z$   % Input file name implicitly, read its raw bytes and push them as an array of chars
2\   % Modulo 2
z    % Number of nonzero values. This gives the number of odd bytes. Implicitly display

Luis Mendo

Posted 2016-08-04T22:54:49.263

Reputation: 87 464

9

Printable .COM file, 100 bytes

^FZjfDXVL\,LPXD$$4"PXD,lHPXDjJXDRDX@PXDjtXDH,nPXDj@XD4`@PXD,ZHPXD4,@PXD4:4"PXDH,\PXD4"PXD,hPXDRDX@P\

Hexdump:

00000000  5e 46 5a 6a 66 44 58 56  4c 5c 2c 4c 50 58 44 24  |^FZjfDXVL\,LPXD$|
00000010  24 34 22 50 58 44 2c 6c  48 50 58 44 6a 4a 58 44  |$4"PXD,lHPXDjJXD|
00000020  52 44 58 40 50 58 44 6a  74 58 44 48 2c 6e 50 58  |RDX@PXDjtXDH,nPX|
00000030  44 6a 40 58 44 34 60 40  50 58 44 2c 5a 48 50 58  |Dj@XD4`@PXD,ZHPX|
00000040  44 34 2c 40 50 58 44 34  3a 34 22 50 58 44 48 2c  |D4,@PXD4:4"PXDH,|
00000050  5c 50 58 44 34 22 50 58  44 2c 68 50 58 44 52 44  |\PXD4"PXD,hPXDRD|
00000060  58 40 50 5c                                       |X@P\|
00000064

Using a very loose definition of source as something that can be reasonably typed by a human, and inspired by EICAR Standard Antivirus Test File (more info at "Let's have fun with EICAR test file" at Bugtraq).

Using only printable non-odd ASCII bytes (side note: opcodes affecting words tend to be odd, the W bit is the lsb of some opcodes), it constructs a fragment of code at SP (which we conveniently set just past our generating code), and execution ends up falling through to the generated code.

It uses the fact that the stack initially contains a near pointer to the start of the PSP, and that the start of the PSP contains the INT 20h instruction (more info on this at https://stackoverflow.com/questions/12591673/).

Real source:

; we want to generate the following fragment of code

;  5E                pop si             ; zero SI (pop near pointer to start of PSP)
;  46                inc si             ; set SI to 1
; loop:
;  B406              mov ah,0x6         ; \
;  99                cwd                ; >
;  4A                dec dx             ; > D-2106--DLFF
;  CD21              int 0x21           ; > DIRECT CONSOLE INPUT
;  7405              jz end             ; > jump if no more input
;  40                inc ax             ; > lsb 0/1 odd/even
;  21C6              and si,ax          ; > zero SI on first odd byte
;  EBF3              jmp short loop     ; /
; end:
;  96                xchg ax,si         ; return code
;  B44C              mov ah,0x4c        ; D-214C
;  CD21              int 0x21           ; TERMINATE WITH RETURN CODE

 pop si             ; this two opcodes don't need to be encoded
 inc si

 pop dx             ; DX = 20CD (int 0x20 at start of PSP)
 push byte +0x66
 inc sp
 pop ax
 push si
 dec sp
 pop sp             ; SP = 0x0166
 sub al,0x4c        ; B4
 push ax
 pop ax
 inc sp
 and al,0x24
 xor al,0x22        ; 06
 push ax
 pop ax
 inc sp
 sub al,0x6c
 dec ax             ; 99
 push ax
 pop ax
 inc sp
 push byte +0x4a    ; 4A
 pop ax
 inc sp
 push dx            ; [20]CD
 inc sp
 pop ax
 inc ax             ; 21
 push ax
 pop ax
 inc sp
 push byte +0x74    ; 74
 pop ax
 inc sp
 dec ax
 sub al,0x6e        ; 05
 push ax
 pop ax
 inc sp
 push byte +0x40    ; 40
 pop ax
 inc sp
 xor al,0x60
 inc ax             ; 21
 push ax
 pop ax
 inc sp
 sub al,0x5a
 dec ax             ; C6
 push ax
 pop ax
 inc sp
 xor al,0x2c
 inc ax             ; EB
 push ax
 pop ax
 inc sp
 xor al,0x3a
 xor al,0x22        ; F3
 push ax
 pop ax
 inc sp
 dec ax
 sub al,0x5c        ; 96
 push ax
 pop ax
 inc sp
 xor al,0x22        ; B4
 push ax
 pop ax
 inc sp
 sub al,0x68        ; 4C
 push ax
 pop ax
 inc sp
 push dx            ; [20]CD
 inc sp
 pop ax
 inc ax
 push ax            ; 21
 pop sp             ; now get the stack out of the way

ninjalj

Posted 2016-08-04T22:54:49.263

Reputation: 3 018

8

CJam, 18 17 15 bytes

"<rj":(((*~:|X&

Assumes that the locale is set to Latin-1. Try it online!

How it works

The straightforward solution goes as follows.

q       e# Read all input from STDIN and push it as a string on the stack.
 :i     e# Cast each character to its code point.
   :|   e# Take the bitwise OR of all code points.
     X  e# Push 1.
      & e# Take the bitwise AND of the logical OR and 1.

Unfortunately, the characters q and i cannot appear in the source code. To work around this issue, we are going to create part of the above source code dynamically, then evaluate the string.

"<rj"         e# Push that string on the stack.
     :(       e# Decrement all characters, pushing ";qi".
       (      e# Shift out the first character, pushing "qi" and ';'.
        (     e# Decrement ';' to push ':'.
         *    e# Join "qi" with separator ':', pushing "q:i". 
          ~   e# Evaluate the string "q:i", which behaves as explained before.

Dennis

Posted 2016-08-04T22:54:49.263

Reputation: 196 637

7

Pyth, 20 13 bytes

vj0>LhZ.BRj.z

Or in binary:

00000000: 01110110 01101010 00110000 00111110 01001100 01101000  vj0>Lh
00000006: 01011010 00101110 01000010 01010010 01101010 00101110  Z.BRj.
0000000c: 01111010                                               z

Try it online

How it works

           .z   all lines of input
          j     join on newline
       .BR      convert each character to binary
   >LhZ         take the last (0 + 1) characters of each binary string
 j0             join on 0
v               evaluate as an integer

The resulting integer is truthy (nonzero) iff any of the bytes were odd.

Anders Kaseorg

Posted 2016-08-04T22:54:49.263

Reputation: 29 242

4

Retina, 106 bytes

Removes every allowed character, then matches any remaining characters. Truthy values will be the number of characters found. Falsey values will be 0.

`"| |\$|&|\(|\*|,|\.|0|2|4|6|8|:|<|>|@|B|D|F|H|J|L|N|P|R|T|V|X|Z|\\|\^|`|b|d|f|h|j|l|n|p|r|t|v|x|z|\||~

.

Try it online

Since . doesn't match newlines by default, I don't have to remove them.

mbomb007

Posted 2016-08-04T22:54:49.263

Reputation: 21 944

4

Jelly, 13 bytes

24‘ịØBvF|\ṪBṪ

Expects the input as a quoted command-line argument. Try it online!

Hexdump

0000000: 32 34 fc d8 12 42 76 46 7c 5c ce 42 ce           24...BvF|\.B.

Dennis

Posted 2016-08-04T22:54:49.263

Reputation: 196 637

If it was not for the odd byte restriction, this would equally work at 6 bytes: O%2¬Ạ¬. – Erik the Outgolfer – 2016-09-19T14:50:50.420

1

Perl 5 + -p0, 136 bytes

Similar to other answers, this removes all even bytes and leaves any odd bytes (which is truthy).

tr<
 "$&(*,.02468:<>@BDFHJLNPRTVXZ\\^`bdfhjlnprtvxz|~€‚„†ˆŠŒŽ’”–˜šœž ¢¤¦¨ª¬®°²´¶¸º¼¾ÀÂÄÆÈÊÌÎÐÒÔÖØÚÜÞàâäæèêìîðòôöøúüþ><>d

Try it online!

Dom Hastings

Posted 2016-08-04T22:54:49.263

Reputation: 16 415

-0 does nothing to newlines. It only determines how to split up the input, it doesn't remove any characters. – Ørjan Johansen – 2017-08-17T16:34:58.927

Ouch that's too bad. – Ørjan Johansen – 2017-08-17T16:36:06.773

@ØrjanJohansen Yeah, you're right about -0, I wanted to do the whole block as a lump, but that shouldn't matter, but I can't get around this... Too bad! I'll clean up these comments. Thanks for the heads up though! – Dom Hastings – 2017-08-17T16:37:35.400

So it works now? Guess I should delete some of the comments. From the edit diff, I see you're now including every even byte in the program. I think you might want to say that explicitly, since not all those characters show up (for me at least). – Ørjan Johansen – 2018-02-06T17:20:49.500

@ØrjanJohansen yes! I think I've got it now. I don't think all other answers cover all even bytes either, I think a few only work on printable ASCII. I'm pretty confident this does what I wanted now. I hope so anyway! – Dom Hastings – 2018-02-06T17:33:53.993

0

Japt, 10 bytes

ø0ôH² ®dZÄ

Try it online!

Japt's codepage is ISO-8859-1. The code gives false when itself is entered as a string, therefore a valid submission.

Unpacked & How it works

Uø0ôHp2  mZ{ZdZ+1

Uø      Does input string contain any element in the following array...?
0ôHp2     Range of 0 to 32**2, inclusive
mZ{       Map...
ZdZ+1       Convert the number Z to a char having charcode 2*Z+1

Not having String.c (get charcode, or map over charcodes) was a pain, but fortunately there is Number.d (convert number to char).

Turns out that Japt wins over CJam, Pyth and Jelly :)


Without the restriction, there are a couple of ways to do it in 6 bytes (going par with CJam and Jelly again):

®c uÃn

Unpacked: UmZ{Zc u} n

UmZ{   Map on each char...
Zc u     Convert to charcode modulo 2
}
n      Convert the resulting string to number

"000..000" is converted to the number 0 (falsy) regardless of how long it is. On the other hand, anything that contains 1 is converted to a nonzero double, or Infinity if it's too big (both truthy).

¬d_c u

Unpacked: q dZ{Zc u

q    Convert to array of chars
dZ{  Is something true when mapped with...
Zc u   Convert each char to charcode modulo 2

More straightforward approach that directly yields true or false.

Or, 5 bytes solution is even possible with the help of -d flag:

¨c u

Unpacked: q mZ{Zc u

q     Convert to array of chars
mZ{   Map...
Zc u    Convert to charcode modulo 2

      Result is array of zeros and ones
-d    Apply .some() on the resulting array

Bubbler

Posted 2016-08-04T22:54:49.263

Reputation: 16 616