One problem on a website like this is that you often don't know if you are talking to a male or female. However, you have come up with a simple NLP technique you can use to determine the gender of the writer of a piece of text.

Theory

About 38.1% of letters used in English are vowels [a,e,i,o,u] (see References below, y is NOT a vowel in this case). Therefore, we will define any word that is at least 40% vowels as a feminine word, and any word that is less than 40% vowels as a masculine word.

Beyond this definition we can also find the masculinity or femininity of a word. Let C be the number of consonants in the word, and V be the number of vowels:

If a word is feminine, it's femininity is 1.5*V/(C+1).
If a word is masculine, it's masculinity is C/(1.5*V+1).

For example, the word catch is masculine. Its masculinity is 4/(1.5*1+1) = 1.6. The word phone is feminine. Its femininity is 1.5*2/(3+1) = .75.

Algorithm

To figure out the gender of the writer of a piece of text, we take the sum of the masculinity of all the masculine words (Σ_M), and the sum of the femininity of all the feminine words (Σ_F). If Σ_M > Σ_F, we have determined that the writer is a male. Otherwise, we have determined that the writer is a female.

Confidence Level

Finally, we need a confidence level. If you have determined that the writer is female, your confidence level is 2*Σ_F/(Σ_F+Σ_M)-1. If you have determined that the writer is male, the confidence level is 2*Σ_M/(Σ_F+Σ_M)-1.

Input

Input is a piece of English text including punctuation. Words are all separated by spaces (You don't have to worry about new-lines or extra spaces). Some words have non-letter characters in them, which you need to ignore (such as "You're"). If you encounter a word that is all non-letters (like "5" or "!!!") just ignore it. Every input will contain at least one usable word.

Output

You need to output an M or F depending on which gender you think the writer is, followed by your confidence level.

Examples

There's a snake in my boot.
- Gender + masculinity/femininity of each word: [M1.0,F1.5,F.75,F.75,M2.0,F1.0]
- Σ_M = 3.0, Σ_F = 4.0
- CL: 2*4.0/(4.0+3.0)-1 = .143
- Output: F .143
Frankly, I don't give a ^$*.
- [M2.4,F1.5,M1.2,F1.0,F1.5], Σ_M = 3.6, Σ_F = 4.0, CL: 2*4.0/(4.0+3.6)-1 = .053, Output: F .053
I'm 50 dollars from my goal!
- [F.75,M1.25,M1.2,M2.0,F1.0], Σ_M = 4.45, Σ_F = 1.75, CL: 2*4.45/(4.45+1.75)-1 = .435, Output: M .435

References

Percentage of vowels in English dictionary words (38.1%)
Percentage of vowels in English texts (38.15%)

geokavel

Posted 2017-07-17T22:44:39.877

Reputation: 6 352

Comments are not for extended discussion; this conversation has been moved to chat.

– Dennis – 2017-07-22T06:09:23.910

Answers

Python 3, 320 317 307 286 253 189 bytes

h=S=0
for v in input().split():V=sum(map(v.count,'aeiouAEIOU'));C=sum(x.isalpha()for x in v);H=V<.4*C;C-=V;K=[1.5*V/(C+1),C/(1.5*V+1)][H];h+=K*H;S+=K-K*H
print('FM'[h>S],2*max(S,h)/(S+h)-1)

Try it online!

Ungolfed:

def evaluateWord(s):
    V = len([*filter(lambda c: c in 'aeiou', s.lower())])
    C = len([*filter(lambda c: c in 'bcdfghjklmnpqrstvxzwy', s.lower())])
    isMasculine = V < 0.4*(V+C)
    return C/(1.5*V+1) if isMasculine else 1.5*V/(C+1), isMasculine


def evaluatePhrase(s):
    scores = []
    for word in s.split():
        scores.append(evaluateWord(word))
    masc = 0
    fem = 0
    for score in scores:
        if score[1]:
            masc += score[0]
        else:
            fem += score[0]
    return ('M', 2*masc/(fem+masc)-1) if masc > fem else ('F', 2*fem/(fem+masc)-1)


print(evaluatePhrase("There's a snake in my boot."))

wrymug

Posted 2017-07-17T22:44:39.877

Reputation: 772

You can save 4 bytes by using semicolons and putting all of the first function on one line. Try it online!

– Comrade SparklePony – 2017-07-17T23:42:44.210

@ComradeSparklePony thanks! – wrymug – 2017-07-17T23:45:45.957

1map(e,s.split()) instead of [e(x)for x in s.split()] – Value Ink – 2017-07-18T00:08:07.783

1Also, it's better to return'FM'[h>S],2*max(S,h)/(S+h)-1 at the end – Value Ink – 2017-07-18T00:10:40.013

I looked up a more efficient way to count vowels/consonants via sum(map(s.count,chars)), dropping your count to 253 bytes

– Value Ink – 2017-07-18T00:32:39.237

Golfed to 189 bytes – ovs – 2017-07-18T13:27:05.830

Ruby, 154+1 = 155 bytes

Uses the -n flag.

m=f=0
gsub(/\S+/){s=$&.gsub(/[^a-z]/i){}.upcase;k=s.size;v=s.count'AEIOU';v<k*0.4?m+=(k-v)/(1.5*v+1):f+=1.5*v/(k-v+1)}
puts m>f ??M:?F,2*[m,f].max/(m+f)-1

Try it online!

Value Ink

Posted 2017-07-17T22:44:39.877

Reputation: 10 608

Python 3, 205 201 197 192 bytes

-Thanks @Value Ink for 4 bytes: lower() beforehand
-Thanks @Coty Johnathan Saxman for 9 bytes: Inverted condition .4*(v+c)>v and -~c for (c+1) bitshift-based consonant check instead of literal.

Python 3, 192 bytes

M=F=0
for i in input().lower().split():
 v=sum(j in'aeiou'for j in i);c=sum(33021815<<98>>ord(k)&1for k in i)
 if.4*(v+c)>v:M+=c/(1.5*v+1)
 else:F-=1.5*v/~c
print('FM'[M>F],2*max(M,F)/(F+M)-1)

Try it online!

officialaimm

Posted 2017-07-17T22:44:39.877

Reputation: 2 739

1for i in input().lower().split(): so that you only need to look in 'aeiou' for the vowel count and cut the lower call in the consonant count. – Value Ink – 2017-07-18T03:41:00.640

1In your 'else', the divisor (c+1) can be shortened to -~c, with no parentheses, saving a byte. This negative can then, in turn, be carried to your +=, making it a -= and saving one more byte. F-=1.5*v/~c – Coty Johnathan Saxman – 2017-07-18T06:07:57.677

1Switching the order of your inequality (in your if statement) saves you one more byte because you can delete the space. if.4*(v+c)>v – Coty Johnathan Saxman – 2017-07-18T06:10:55.500

1This is a tricky one, but you can save 5 bytes by switching your consonant lookup for a hardcoded binary lookup table. k in'bcdfghjklmnpqrstvxzwy'for k... becomes 33021815<<98>>ord(k)&1for k... [https://tio.run/##JY3NboMwEITvPMX2UtuQAA5FSknM0VIPfoKqB4tA6sTGlvlJk0NfnTpUGmm1M99o3H38tn2xLIJxlked9aBA9UFuGjFJtb21PtzBaRX@KoKZDZPBl0Ag2So7oWfnsnbIoVnDosh3dE/L4/F9X9fWn/CVvNInd/3nIlBd@hbjOWlIPVciYU2GaVrGc0JD2OqhrfiWrU7220TOq37EiAv0KWr@tdnFRv5gseEkwzwRZEvJsnwgA2UOJ6u19AN03howd3Dq8ZDgpNdh/2ylfvkD Try it online!] – Coty Johnathan Saxman – 2017-07-18T06:57:32.890

C (gcc), 237 229 222 216 bytes

Boy I though I could do this in a LOT LESS BYTES...

v,c;float m,f;g(char*s){for(m=f=0;*s;v*1.0/(c+v)<.4?m+=c/(1.5*v+1):1?f+=1.5*v/(c+1):0,s+=*s!=0)for(v=c=0;*s&&*s^32;s++)isalpha(*s)?strchr("AaEeIiOoUu",*s)?++v:++c:0;printf("%c %.3f",m>f?77:70,(m>f?2*m:2*f)/(f+m)-1);}

Try it online!

cleblanc

Posted 2017-07-17T22:44:39.877

Reputation: 3 360

196 bytes – ceilingcat – 2019-01-22T18:36:34.023

Common Lisp, 404 bytes

(defun f(x &aux(a 0)c(f 0)m v u)(labels((w(x &aux(p(position #\  x)))(cons(#1=subseq x 0 p)(and p(w(#1#x(1+ p)))))))(dolist(e(w(coerce x'list)))(setf v(#2=count-if(lambda(x)(member x(coerce"aeiouAEIOU"'list)))e)u(#2#'alpha-char-p e)c(- u v)m(and(> c 0)(<(/ v c)4/6)))(and(> u 0)(if m(incf a(/ c(1+(* v 3/2))))(incf f(/ v 2/3(1+ c))))))(format t"~:[F~;M~] ~4f~%"(> a f)(-(/(* 2(if(> a f)a f))(+ a f))1))))

Good old verbose lisp!

Try it online!

Ungolfed version:

(defun f(x &aux (a 0) c (f 0) m v u)        ; parameter & auxiliary variables
  (labels ((w (x &aux (p (position #\  x))) ; recursive function to split input into words
              (cons (subseq x 0 p) (and p (w (subseq x (1+ p)))))))
    (dolist (e (w (coerce x 'list)))        ; for each word 
      (setf v (count-if (lambda (x) (member x(coerce"aeiouAEIOU"'list))) e) ; count vowels
            u (count-if 'alpha-char-p e)    ; count all alfabetic letters
            c (- u v)                       ; calculate consonants
            m (and (> c 0) (< (/ v c) 4/6))); is male or not?
      (and (> u 0)                          ; if non-empty word
           (if m
               (incf a (/ c (1+ (* v 3/2)))); increase masculinity
               (incf f (/ v 2/3 (1+ c)))))) ; increase femininity
    (format t "~:[F~;M~] ~4f"               ; print
              (> a f)                       ; “gender”
              (-(/ (* 2 (if (> a f)a f)) (+ a f)) 1))))  ; and confidence

Renzo

Posted 2017-07-17T22:44:39.877

Reputation: 2 260

Is the writer a man or woman?

Theory

Algorithm

Confidence Level

Input

Output

Examples

References

Answers

Python 3, 320 317 307 286 253 189 bytes

Ruby, 154+1 = 155 bytes

Python 3, 205 201 197 192 bytes

Python 3, 192 bytes

C (gcc), 237 229 222 216 bytes

Common Lisp, 404 bytes