Distinguish between Masculine and Feminine Nouns in French within 100 characters

21

1

You are to write a very small program within 100 characters. Your program must distinguish between masculine and feminine french nouns. The output should be un if it is masculine and une if it is feminine. Often, there are certain statistical rules you can follow (e.g. if it ends in an "e" it is more likely feminine than masculine).

Input:

A french word; it may consist of any lowercase letters and dashes, including lowercase letters with accents.

Example input: ami

Output:

un if the word is masculine and une if the word is feminine.

Example output: un

You do not have to get every word right; your goal is to be as accurate as possible.

Scoring: Your answer must be within 100 characters. Statements such as print or console.log or alert do not count as part of your total bytes. You may also write a function or method that performs this task, in which case the first few bytes (e.g. f=x=>) which are part of the function declaration do not count to your total. Your total score is the number of incorrect answers. Ties are broken by code size.

Nouns to test with:

un ami
un café
un chapeau
un concert
un crayon
un garage
un garçon
un lit
un livre
un mari
un musée
un oncle
un ordinateur
un pantalon
un piano
un pique-nique
un portable
un père
un sandwich
un saxophone
un stade
un stylo
un théâtre
un téléphone
un voisin
une botte
une boum
une chaise
une chaussette
une chemise
une clarinette
une copine
une femme
une fille
une glace
une heure
une lampe
une maison
une montagne
une personne
une piscine
une pizza
une radio
une raquette
une salade
une souris
une sœur
une table
une télé
une voiture

soktinpk

Posted 2014-12-05T22:41:23.150

Reputation: 4 080

6I would have added un squelette to the list just to make things hard. – 200_success – 2014-12-06T01:29:49.033

Answers

23

CJam, 0 incorrect, 32 29 bytes

This code uses a few odd characters (some of them unprintable), but they are all well within extended ASCII range. So again, I'm counting each character as a single byte.

"un"'el2b"zPB:  ":i+:%2/*

Due to the unprintable characters, I'm sure Stack Exchange swallows some, so you might want to copy the code from the character counter (it shows bytes with UTF-8 encoding, which is suboptimal for this challenge; also, the link doesn't seem to work in Firefox, but does in Chrome).

Test it here.

After some more discussion in chat, we figured that the regex golfing wouldn't bring us much further. So following an earlier (joking) suggestion of mine we started looking into manipulating the character codes of the words with certain functions, such that all words from one group would yield a number with some property that is easy to check. And we got luckier than we expected! Here is what the code does to the words:

  • Implicitly convert the characters in the word to their code points.
  • Interpret those as digits in base 2 (yes, the digits will be much larger than 0 or 1, but CJam can handle that).
  • Repeatedly take the result modulo... the following numbers: [133, 122, 80, 66, 58, 26, 20, 14, 9, 4]. This sequence of numbers is itself encoded as the code points of a string (this is where the weird and unprintable characters come in).
  • As if by magic, all 25 masculine nouns yield 0 or 1, and all 25 feminine nouns yield 2 or 3 with this procedure. So if we divide this by 2 (integer division) we get zeroes for masculine nouns and ones for feminine nouns.

To round it off, we push "un" on the stack, the we push a single e. Then we read the input word from STDIN and perform the above computation, and finally multiply the e by the result.

I have never folded modulo onto any list before, and I feel like I never will again...

Many thanks for xnor and Sp3000 for throwing ideas around and helping with the search for divisor chain.

Martin Ender

Posted 2014-12-05T22:41:23.150

Reputation: 184 808

Not only shorter, but 2 minutes faster. The horror! – Dennis – 2014-12-06T04:49:18.510

@sudo ;) ... one of the rare times I'm able to beat you... I'd be very much interested in an explanation of yours though :) – Martin Ender – 2014-12-06T04:50:04.263

11Wait, I'm confused. If magic exists, why are you wasting it on a silly programming challenge site and not solving world peace or something? (No, but seriously, woah. +1) – Doorknob – 2014-12-06T04:51:25.950

22

Ruby, 0 incorrect, 63 56 53 52 51 50 bytes

All characters are in extended ASCII, specifically ISO 8859-1, so I'm counting each character as a single byte.

f=->s{s[/la|tt|i.e|[égdzœu]..$|^b|^f|so|^ta/]?'une':'un'}

It looks like your test set was a bit too short. I've generated the regex with Peter Norvig's meta regex golfer.

You can call the above function like f["ami"]. You can use this test harness to check all test cases:

puts "ami café chapeau concert crayon garage garçon lit livre mari musée 
      oncle ordinateur pantalon piano pique-nique portable père sandwich 
      saxophone stade stylo théâtre téléphone voisin botte boum chaise 
      chaussette chemise clarinette copine femme fille glace heure lampe 
      maison montagne personne piscine pizza radio raquette salade souris 
      sœur table télé voiture".split.map{|s|f[s]+" "+s}

Test it on Coding Ground.

Edit: Using Peter Norvig's second script I found a different regex, that was actually one byte longer, but which I could shorten by two bytes by hand.

Edit: Sp3000 set the regex golfer he wrote for my recent regex challenge on it, and found a 36 35 34 byte regex for me to use. Thanks for that!

Martin Ender

Posted 2014-12-05T22:41:23.150

Reputation: 184 808

2Reliving the nightmares of meta regex golf here because table is a substring of portable, and switching which set to match isn't very useful because the second set seems easier to match... – Sp3000 – 2014-12-06T02:07:02.813

13

CJam, 0 errors (36 32 29 28 bytes)

{"un"oEb72^"+ÕåWïº"583b2b='e*o}:F;

This is a named function, so I'm only counting the inner code. Also, o is a print statement, so it doesn't contribute to the byte count.

Try the test cases in the CJam interpreter.

How it works

"un"o       " Print 'un'.                                                  ";
Eb          " Consider the input a base 14 number.                        ";
72^         " XOR the result with 72.                                     ";
"+ÕåWïº"    " Push that string.                                           ";
583b2b      " Convert from base 583 to base 2.                            ";
=           " Retrieve the corresponding element (0 or 1) from the array. ";
'e*o        " Print 'e' that many times.                                  ";

Just a hash function and a table lookup.

Dennis

Posted 2014-12-05T22:41:23.150

Reputation: 196 637