Regex - Match half of the strings

0

Okay, the challenge is that you have to write a regex that matches half of all strings.

Some rules to clarify:

For ANY string length(>0), if I permute all strings(of that length), exactly half of them should be matched. Put more lightly, if I generate a random string of that length it should have a 50% chance of being matched. (this means that you cannot just match all even-length strings)

You may assume that strings contain only A-Z, a-z, and 0-9.

0 length strings are undefined, you can choose to either match them or not.

Shortest regex wins.

Cruncher

Posted 2014-01-16T20:49:45.593

Reputation: 2 135

/^[A-Z0-4]*$/ works, doesn't it? – Tim Seguine – 2014-01-16T20:53:23.107

@TimSeguine No, if strings could have only uppercase, or only lower case it would. But since most strings have a combination of both, that misses most strings – Cruncher – 2014-01-16T20:55:46.760

/^[\w0-4]/ works, doesn't it? – John Dvorak – 2014-01-16T20:56:46.377

oh yeah sorry, it works for one character but gets worse and worse after that. If i remove the star and the $ though, it should work. It takes then half of all one character strings and all of the other strings that start with one of those characters. – Tim Seguine – 2014-01-16T20:56:50.197

@JanDvorak I don't see how. Can you explain? – Cruncher – 2014-01-16T21:00:14.717

@Cruncher the same as above. If it starts with lowercase or a low digit, match. Otherwise, don't. – John Dvorak – 2014-01-16T21:01:46.563

@JanDvorak but doesn't \w match all word characters? – Tim Seguine – 2014-01-16T21:02:32.983

@TimSeguine if it's case insensitive, then A-Z it is. – John Dvorak – 2014-01-16T21:02:58.263

@JanDvorak The problem doesn't say anything about case insensitivity – Tim Seguine – 2014-01-16T21:04:07.007

@TimSeguine I mean, if \w matches both lowercase and uppercase – John Dvorak – 2014-01-16T21:05:31.030

1@JanDvorak I know, that is exactly my point. You are matching too many one character strings. There are 62, and you match 57 of them. It does it that way in PCRE at least IIRC. Is there a dialect in which it doesn't? If there is, then I concede defeat. – Tim Seguine – 2014-01-16T21:08:14.787

@JanDvorak Please don't say that \W matches uppercase letters. – Kendall Frey – 2014-01-16T21:15:35.243

Answers

10

6

/^[5-Z]/

In ASCII's order, there's 0-9 then A-Z then a-z So Half of the string begin with 5-Z, the other half doesn't.

xem

Posted 2014-01-16T20:49:45.593

Reputation: 5 523

Dang, my answer is a duplicate of this... – Justin – 2014-01-16T21:21:39.467

But thanks for posting it, it made me realize that the good answer didn't necessarily include half of the letters and half of the numbers. any 31-char subset of A-Za-z0-9 is fine – xem – 2014-01-16T21:24:08.767

2@xem you are already making me regret changing my display name to my real name. :P – Tim Seguine – 2014-01-16T21:27:25.660

Oh god it's a real name? I'm sorry Tim. No more jokes. God bless you. I hope one day you'll forgive me. ='( – xem – 2014-01-16T21:30:43.547

@xem when I see you in hell ;) – Tim Seguine – 2014-01-16T21:32:51.913

And say hello to your goat for me. http://histoires-d-ailleurs.over-blog.com/mr-seguin-s-goat-another-wolf-story

– xem – 2014-01-16T21:35:19.380

@xem haha, you are the second person to ever mention that to me. – Tim Seguine – 2014-01-16T21:59:10.813

5

6

/^[0-U]/

Ranges work for ascii values, so this matches half of the 62 combinations of first letters. Can match the empty string for one extra char:

/^[0-U]?/

For more confusion, this is also a solution:

/^[--U]/

Also this:

/^[-U]/

where the ascii 0 is right before the -.

Justin

Posted 2014-01-16T20:49:45.593

Reputation: 19 757

/^[-U]/ is this considered 5 chars? Or still 6, just that one isn't printable? – Cruncher – 2014-02-27T14:42:24.923

@Cruncher One was unprintable. Otherwise, that would only match the characters - and U. – Justin – 2014-02-27T19:57:50.120

4

9 characters (not counting leaning toothpicks)

Modified mine from the comments:

 /^[A-Z0-4]/

It will match half of all 1 character strings. Every string of length greater than one starts with a one character string, half of which will match. So half match in total.

Tim Seguine

Posted 2014-01-16T20:49:45.593

Reputation: 824

3+1 for leaning toothpicks if I could (but that would be +2) – John Dvorak – 2014-01-16T21:07:46.947

2

24

You didn't ask for a max length.

So an infinity of strings can exist.

Half of infinity = infinity.

But that doesn't lead anywhere.

Anyway, I'd say that "half of this infinite number of strings" start with [A-Z0-4] and the other half with [a-z5-9]

That's why I consider this answer correct (inspired by Tim Seguine)

/^[A-Z0-4][A-Za-z0-9]*$/

xem

Posted 2014-01-16T20:49:45.593

Reputation: 5 523

My name is Tim by the way. ;) – Tim Seguine – 2014-01-16T21:06:41.000

I did say that you need half of them for every length. That being said, this is a correct solution. It's just the end of it is redundant. – Cruncher – 2014-01-16T21:06:43.900

But your regex isn't the shortest one by any means. Even if string anchors are required, not matching non-alphanumeric words isn't. – John Dvorak – 2014-01-16T21:06:50.717

4Sorry for the typo Tam Seguine. – xem – 2014-01-16T21:07:57.770

1By your logic, all that is needed is /^[A-Z0-4]/, but that is Tim's solution... – Justin – 2014-01-16T21:08:11.430

No because /^[A-Z0-4]/ accepts strings that end with things other than A-Za-z0-9. I thought it wasn't allowed. – xem – 2014-01-16T21:09:40.293

@xem I didn't say that it had to only match those strings, I said that you can assume that ALL strings have this form (or in other words, strings not of this form can have undefined behaviour) – Cruncher – 2014-01-16T21:10:33.993

So Tum Seguine's answer is the good one :) okay. – xem – 2014-01-16T21:11:53.113

It's okay, this answer is also Tem approved – Tim Seguine – 2014-01-16T21:12:23.227

Your arguing about infinity, by the way, would be right, except the OP specified half of the strings of each length. – Tim Seguine – 2014-01-16T21:15:01.657

I was just joking about infinity/2 Tym ;) – xem – 2014-01-16T21:19:34.667

@xem oh we are already moving into the semivowels? Next will be Twm I guess. – Tim Seguine – 2014-01-16T21:28:58.510

@TimSeguine already moving into the semivowel(no s). We can all just call you Mr. T – Cruncher – 2014-01-16T21:56:11.670

@Cruncher don't want to go off on a tangent, but are you implying there is only one semivowel? – Tim Seguine – 2014-01-17T11:03:08.253

@TimSeguine in English – Cruncher – 2014-01-17T13:13:08.797

@Cruncher Don't want to argue really, but there are actually plenty of uses of w in English as a semivowel – Tim Seguine – 2014-01-17T13:59:03.693