23
2
We've already got a meta-regex-golf problem as inspired by the xkcd comic
But, this regex golf looks fun, too! I want to distinguish between the states of the US and the regions of Italy. Why? I'm a citizen of both countries, and I always have trouble with this*.
The regions of Italy are
Abruzzo, Valle d'Aosta, Puglia, Basilicata, Calabria, Campania, Emilia-Romagna, Friuli-Venezia Giulia, Lazio, Liguria, Lombardia, Marche, Molise, Piemonte, Sardegna, Sicilia, Trentino-Alto Adige/Südtirol, Toscana, Umbria, Veneto
and the states of the USA are
Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin, Wyoming
Your job is to write a program which distinguishes these lists with a regular expression. This is a new game, so here's the
Rules
- Distinguishing between lists must be done with a single matching regular expression.
- Your score is the length of that regular expression, smaller is better.
To be clear: all work must be done by the regular expression -- no filtering, no replacements, no nothing... even if those are also done with regular expressions. That is, the input should be passed directly into a regular expression, and only the binary answer (match / no match) can be used by later parts of the code. The input should never be inspected or changed by anything but the matching expression. Exception: eating a newline with something akin to Ruby's chomp
is fine.
Your program should take a single entry (optionally followed by \n
or EOF
if it makes things easier) from either list from stdin, and print to stdout the name of that list. In this case, our lists are named Italy
and USA
.
To test your code, simply run both lists through it. Behavior may be undefined for strings which do not occur in the list.
Scoring Issues
This might have to be done on a language-by-language basis. In Perl,
m/foobarbaz/
is a matching regular expression. However, in Python,
import re
re.compile('foobarbaz')
does the same thing. We wouldn't count the quotes for Python, so I say we don't count the m/
and final /
in Perl. In both languages, the above should receive a score of 9.
To clarify a point raised by Abhijit, the actual length of the matching expression is the score, even if you generate it dynamically. For example, if you found a magical expression m
,
n="foo(bar|baz)"
m=n+n
then you should not report a score of 12: m
has length 24. And just to be extra clear, the generated regular expression can't depend on the input. That would be reading the input before passing it into the regular expression.
Example Session
input> Calabria
Italy
input> New Hampshire
USA
input> Washington
USA
input> Puglia
Italy
* Actually, that's a lie. I have never had any trouble with this at all.
Can you please explain, what you mean by "no filtering, no replacements, no nothing... even if those are also done with regular expressions.". Just to clarify, does it mean filtering, replacements of the list of states/regions or the focus is wider? – Abhijit – 2014-01-07T04:29:59.003
@Abhijit edited. Is that clearer? – boothby – 2014-01-07T04:34:32.780
Hopefully, let me post an answer and see if you feel it violates the rule in any way :-) – Abhijit – 2014-01-07T04:35:17.400
How are flags counted? Do we get case-insensitive for free? – John Dvorak – 2014-01-07T05:58:06.290
@JanDvorak Good thinking! No: flags cost extra, just the
m//
is free. – boothby – 2014-01-07T06:00:21.333@Boothby, aren't you forgetting District of Columbia? – WallyWest – 2014-01-07T10:07:52.380
3
@Eliseod'Annunzio: DC is not a state
– Kyle Kanos – 2014-01-07T14:21:51.6101"Behavior may be undefined for strings which do not occur in the list." this rule is broken: it allows one to return
USA
in case of such a string, hence you would just have to check Italian regions, and returnUSA
otherwise. – o0'. – 2014-01-14T18:59:28.487@Lohoris "This rule is broken" is an opinion. Codegolf tends to encourage cutting corners like this. – boothby – 2014-01-14T19:23:52.120
1@boothby well, no, it's simple logic: it is basically asking only a regexp to match italian regions, but needlessly worded in a much complicated way. The whole point about american states is totally not relevant to the actual question asked, thanks to this bug. This also makes the question much less interesting. – o0'. – 2014-01-14T20:52:04.277
@Lohoris A cursory perusal of the answers below indicates that your "simple logic" is still just an opinion, and it may be better to match states of the US instead. – boothby – 2014-01-14T22:12:06.940
@Eliseod'Annunzio: If we include DC, we should also include Guam, CNMI, American Samoa, the US Virgin Islands, and nine uninhabited atolls. – Mechanical snail – 2014-01-15T19:37:29.660
Fair enough... just checking... – WallyWest – 2014-01-15T23:59:06.040
@Eliseod'Annunzio The internet is made for dogpiling. – boothby – 2014-01-16T00:11:36.063
@boothby Please tell me there is a definition for "dogpiling" in this context...? Very 404 here... – WallyWest – 2014-01-16T02:51:45.620
@Eliseod'Annunzio https://www.wordnik.com/words/dogpile
– boothby – 2014-01-16T03:32:27.137