Make a regex that matches certain binary numbers

0

Your task is to create a regular expression that matches most binary numbers with an even number of 0s and an odd number of 1s (e.g. "101100011").

The full criteria for the regex are:

  • matches 90% or more of all binary numbers between 0 and 11111111 with even number of 0s and an odd number of 1s,
  • doesn't match 90% or more of all binary numbers between 0 and 11111111 with odd number of 0s or an even number of 1s,
  • works in most common programming languages.

Ideally, it should be short and creative.

The input string always has a length that is greater than 1, less than infinity, and only contains 1s and 0s.

I will upvote any answer that seriously attempts to match the above criteria. Once the activity around this question dies down, I'll accept the highest-voted answer that is at least only 1.333... times the length of the shortest answer.

The Guy with The Hat

Posted 2013-12-23T14:09:22.623

Reputation: 778

Question was closed 2013-12-23T23:36:03.633

http://stackoverflow.com/q/20485486/1223693 – Doorknob – 2013-12-23T14:13:37.993

@DoorknobofSnow shhhhh – The Guy with The Hat – 2013-12-23T14:14:03.773

3Maybe I don't understand the problem, but there are none which can match. All binaries you assume as input are exactly 8 bits long (your examples indicate that leading zeros are provided) and thus odd+even cannot be true at all. – Howard – 2013-12-23T15:55:06.280

1Moreover: short and creative doesn't sound very objective. If it is popularity-contest please tag it accordingly. (note: I don't think that this is a good example of a popularity-contest). – Howard – 2013-12-23T15:56:17.790

As @Howard points out, the 8-bit range specific criteria are impossible, but that would still appear to leave the initial statement of "most binary numbers with an even number of 0s and an odd number of 1s", later qualified with length "greater than 1, less than infinity." I take most to mean >50%. – Darren Stone – 2013-12-23T16:55:47.047

@Howard Thanks for noticing that. I feel stupid now. – The Guy with The Hat – 2013-12-23T20:45:47.693

1I think there is a missing specification of "without leading zeroes" here, which would allow for 8-bit binary numbers <1111101 to match the target pattern. – Iszi – 2013-12-23T21:20:08.367

Remaining thoughts... Opening sentence suggests 9-bit numbers are in scope, while the "full criteria" section limits itself to 8-bits, but then later I read that input strings can be binary strings of infinite length. These are not strictly conflicting, but it's a bit of a head-scratcher. I would also suggest removing the word "most" in the opening sentence (since it suggests a portion of more than 50%) of even 0-count and odd 1-count (of any length, apparently at least 9 bits in the example string) but that is not emphasized again. – Darren Stone – 2013-12-24T04:42:42.543

Answers

6

The following regex

(?=^1*(01*0)*1*$)^.(..)*$

works in many languages. It performs a perfect match for any binary of arbitrary length.

Basically, it consists of two parts which are joined via and by using a lookahead pattern:

  • ^1*(01*0)*1*$ matches if an even number of zeros is provided.
  • ^.(..)*$ tests for an odd number of digits in total.

The test run can be seen here.

Howard

Posted 2013-12-23T14:09:22.623

Reputation: 23 109

0

80 characters long

^(?:(?(1)|(1))|0101|1010|(?:1{2})+|(?:0{2})+|0(?:1{2})+0|1(?:0{2})+1|)+(?(1)|a)$

Doesn't match some numbers that have an odd number over four 1s in a row, e.g 01111101

The Guy with The Hat

Posted 2013-12-23T14:09:22.623

Reputation: 778

so does this match exactly 90% of numbers between 0 and 0b11111111? (I tried to test, but apparently this regex doesn't work in Ruby.) – Doorknob – 2013-12-23T14:19:46.217

@DoorknobofSnow Whoops, edited question. – The Guy with The Hat – 2013-12-23T14:21:06.293

Ah, okay, well in that case, why the {2}s? Repeating the previous digit would be shorter. – Doorknob – 2013-12-23T14:22:04.980

And what's with the a at the end? – Doorknob – 2013-12-23T14:23:53.333

@DoorknobofSnow I don't quite know why I put those {2}s there... The a makes the regex fail if the the first capturing group doesn't capture anything, which should only happen if there is an even number of 1s. – The Guy with The Hat – 2013-12-23T14:28:36.427

I've been able to get down to 52 characters with only the most basic regex features: (1|0(00|11)*(10|01))((10|01)(00|11)*(10|01)|00|11)* – Brilliand – 2014-04-10T22:23:00.280