The PPCG Handicap System

35

4

As we all know, meta is overflowing with complaints about scoring code-golf between languages (yes, each word is a seperate link, and these may be just the tip of the iceberg).

With so much jealousy towards those who actually bothered to look up the Pyth documentation, I thought it would be nice to have a little bit more of a constructive challenge, befitting of a website that specializes in code challenges.


The challenge is rather straightforward. As input, we have the language name and byte count. You can take those as function inputs, stdin or your languages default input method.

As output, we have a corrected byte count, i.e., your score with the handicap applied. Respectively, the output should be the function output, stdout or your languages default output method. Output will be rounded to integers, because we love tiebreakers.

Using the most ugly, hacked together query (link - feel free to clean it up), I have managed to create a dataset (zip with .xslx, .ods and .csv) that contains a snapshot of all answers to questions. You can use this file (and assume it to be available to your program, e.g., it's in the same folder) or convert this file to another conventional format (.xls, .mat, .sav etc - but it may only contain the original data!). The name should remain QueryResults.ext with ext the extension of choice.


Now for the specifics. For each language, there is a Boilerplate B and Verbosity V parameters. Together, they can be used to create a linear model of the language. Let n be the actual number of bytes, and c be the corrected score. Using a simple model n=Vc+B, we get for the corrected score:

    n-B
c = ---
     V

Simple enough, right? Now, for determining V and B. As you might expect, we're going to do some linear regression, or more precise, a least squares weighted linear regression. I'm not going to explain the details on that - if you're not sure how to do that, Wikipedia is your friend, or if you're lucky, your language's documentation.

The data will be as follows. Each data point will be the byte count n and the question's average bytecount c. To account for votes, the points will be weighted, by their number of votes plus one (to account for 0 votes), let's call that v. Answers with negative votes should be discarded. In simple terms, an answer with 1 vote should count the same as two answers with 0 votes.

This data is then fitted into the aforementioned model n=Vc+B using weighted linear regression.


For example, given the data for a given language

n1=20, c1=8.2, v1=1
n2=25, c2=10.3, v2=2
n3=15, c3=5.7, v3=5

Now, we compose the relevant matrices and vectors A, y and W, with our parameters in the vector

  [1 c1]    [n1]    [1 0 0]  x=[B]
A=[1 c2]  y=[n2]  W=[0 2 0],   [V]
  [1 c3]    [n3]    [0 0 5]

we solve the matrix equation (with ' denoting the transpose)

A'WAx=A'Wy

for x (and consequently, we get our B and V parameter).


Your score will be the output of your program, when given your own language name and bytecount. So yes, this time even Java and C++ users can win!

WARNING: The query generates a dataset with a lot of invalid rows due to people using 'cool' header formatting and people tagging their questions as . The download I provided has most of the outliers removed. Do NOT use the CSV provided with the query.

Happy coding!

Sanchises

Posted 2016-02-04T20:02:24.983

Reputation: 8 530

3

s/look up the Pyth documentation/carefully study the two existing pieces of Jelly documentation

– lirtosiast – 2016-02-04T20:09:32.317

Your query doesn't seem to distinguish between Perl 5 and Perl 6. Which is similar to not distinguishing C++ from Haskell. – Brad Gilbert b2gills – 2016-02-04T21:47:04.143

@BradGilbertb2gills I know - it does a whole lot of quirky things, mostly due to people going crazy with formatting. Feel free to improve on it, but right now, it's a trade-off between a lack of version numbering and languages called C++ <s>6 bytes</s>. Besides, I never did any T-SQL before today and I'm already impressed with myself that I managed to extract the bytecount. – Sanchises – 2016-02-04T21:52:02.377

Can we remove outliers, ie any languages with only one entry (usually incorrect language names) or the ones that have >10,000 bytes? – Robert Fraser – 2016-02-05T09:08:23.757

@RobertFraser I thought that would be too much for a single challenge. I'll fix the data file, see edit. – Sanchises – 2016-02-05T09:22:12.763

This is a neat little challenge. If it makes languages I know start scoring more competitively, I might do some golfing ;D – Draco18s no longer trusts SE – 2016-02-05T16:27:43.843

Can you provide a comprehensive set of test cases for this challenge? Otherwise, something like input(); print -1000000 cannot be distinguished from a valid answer. – quintopia – 2016-02-06T06:59:33.717

@quintopia The input/output is deterministic, so if your values differ from the values produced by the competitor(s), either of you must have a mistake. I believe that I am just as likely to make a mistake as you, so reference values produced by me are no more valuable than those produced by your competitor(s). – Sanchises – 2016-02-06T21:09:27.147

@Draco18s You can check out the current Mathematica answer and see if any of the languages njpipeorgan has checked give you an advantage. – Sanchises – 2016-02-06T21:12:07.623

"In simple terms, an answer with 0 votes should count the same as two answers with 1 vote." Flip this, right? – Leif Willerts – 2016-02-15T14:10:33.853

@LeifWillerts Yes. :) – Sanchises – 2016-02-15T14:32:48.587

I have an idea that built-ins could be counted as (# built-ins)/256 bytes. Then languages without one byte functions could be "fairly" scored as having one byte functions. I've played with trying to detect most-used words for each language in mathematica, but am having trouble with the query and extracting answers. If anyone is willing to help, let's open a community wiki! – Michael Klein – 2016-02-16T20:22:38.427

Mathematica, 0:0 ...Please tell me this shouldn't work. – CalculatorFeline – 2016-02-23T02:30:55.573

Answers

21

Mathematica, 244.719 (245 bytes)

f[l_,n_]:=x/.Solve[d=Rest@Import@"QueryResults.csv";LinearModelFit[#.#2/Tr@#&@@{#~Max~-1&/@#4+1,#3}&@@Thread@#&/@{#,#~Cases~{_,l,__}}&/@d~GroupBy~Last/@#[[;;,1,5]],x,x,Weights->Tr/@#[[;;,;;,4]]]&[d~Cases~{_,l,_,v_/;v>=0,_}~GatherBy~Last]@x==n,x]

Test case

f["mathematica", n]   (* { .820033 (n + 53.4263) } *)
f["mathematica", 245] (* { 244.719 } *)

What about other languages?

f["c++", n]           (* { .821181 (n - 79.5437) } *)
f["java", n]          (* { .717579 (n - 56.0858) } *)
f["cjam", n]          (* { 2.21357 (n + 2.73772) } *)
f["pyth", n]          (* { 4.52194 (n - 8.82806) } *)

Alternative model: log(c)=log((n-B)/V)

One notable feature of code golf (and probably other coding problems) is that the distribution of the lengths of programs tends to be exponential distribution (in contrast to uniform distribution). Hence model log(n)=log(Vc+B) is much more likely to balance the influences between points with large c and small c.

As we can see in the graphs below, the distribution of points are suitable for fitting in logarithmic scale.


Results of the new model

Language       V       B

Python       1.365   -19.4    
Javascript   1.002     1.6
Ruby         0.724     1.7
Perl         1.177   -32.7
C            1.105     1.5
Haskell      1.454   -24.5
Mathematica  1.319   -39.7
PHP          1.799   -62.0
Java         1.642     4.4
C#           1.407     4.5

CJam         0.608   -12.5
Pyth         0.519   -11.4
Golfscript   0.766   -18.0
J            0.863   -21.4
APL          0.744   -17.7
K            0.933   -23.3
Retina       1.322   -37.9
MATL         0.762   -13.3
Jelly        0.965   -23.8

We have found two exceptional language - Ruby with V=0.724 and Retina with V=1.322, and a criterion of being a popular golfing language - having a large negative boilerplate.

njpipeorgan

Posted 2016-02-04T20:02:24.983

Reputation: 2 992

@sanchises So far so good, except that you use semicolons as delimiters in csv. – njpipeorgan – 2016-02-05T11:44:38.747

That's Microsoft Excel for you. Apparently saving as csv is too difficult for it. – Sanchises – 2016-02-05T11:49:05.453

So apparently CJam has a negative boilerplate length. Interesting. – PurkkaKoodari – 2016-02-05T13:13:46.833

@Pietu1998 Linear model is not so accurate, I think. – njpipeorgan – 2016-02-05T13:15:45.527

@Pietu1998 Not entirely surprising, since golfing languages generally take implicit input and may return implicit output. Note that "Boilerplate length" is defined w.r.t the average, not w.r.t an ideal boilerplateless language. I'm actually positively surprised by how well this simple model seems to be doing when glancing at these results. – Sanchises – 2016-02-05T14:07:38.857

@sanchises Good explanation. Now I understand why Mathematica has such a large negative boilerplate to some degree. – njpipeorgan – 2016-02-05T14:40:22.963

@njpipeorgan Perhaps you can plot some data and see how languages compare visually; that might give some insight. – Sanchises – 2016-02-05T15:03:37.750

I'm now curious to Retina's score. I've seen a couple answers that look like 1 (plus some whitespace that the ` markup can't wrap but which is important) e.g this answer (3 bytes). It must have some amount of negative boilerplate.

– Draco18s no longer trusts SE – 2016-02-05T16:33:46.000

@Draco18s Retina gets the first place among golfing languages in terms of boilerplate. – njpipeorgan – 2016-02-06T05:07:17.843

@njpipeorgan That appears that it does. – Draco18s no longer trusts SE – 2016-02-06T06:02:29.393

I like how my original challenge got closed as 'too broad', and you just did exactly what I had in mind for the original challenge (i.e., try and construct an appropriate model using graphs etc.) Cheers! – Sanchises – 2016-02-06T21:11:12.347

Congratulations on your bounty + accepted bonus; well-deserved too given the extra effort you put in. One remark: I believe a straight line through a log-log plot is a polynomial of form x^p with p the slope of your straight line, and not an exponential distribution. – Sanchises – 2016-02-22T13:30:54.160

@sanchises Thanks. Exponential distribution is a hint for me to use logarithmic scale, and the fitting result shows x^p just as what you said. – njpipeorgan – 2016-02-22T13:35:19.337

3

Python3, 765.19 (765) bytes

Probably some room for golfing here. Requires numpy for matrix stuff. Reads from stdin, formatted as follows: [lang] [bytes/n]. Stops when you send q.

import numpy as n,csv
L={};Q={};X={};D=n.dot;f=open('QueryResults.csv',encoding="utf8");R=csv.reader(f);f.readline();Z=list.append;M=n.matrix
for r in R:
 if r[1] not in L:L[r[1]]=[]
 if r[4] not in Q:Q[r[4]]=[]
 Z(L[r[1]],r);Z(Q[r[4]],r)
for l in L:
 b=[];a=[];v=[];t=[]
 for r in L[l]:
  if int(r[3])>-1:
   Z(b,int(r[2]));o=[]
   for q in Q[r[4]]:Z(o,int(q[2]))
   Z(a,sum(o)/len(o));Z(v,int(r[3])+1)
 for k in a:Z(t,[1,k])
 if len(t)<1:continue
 A=M(t);T=A.transpose();W=n.diag(v);y=M(b).reshape((len(b),1));e=D(D(T,W),A)
 if n.linalg.det(e)==0:continue
 i=n.linalg.inv(e);X[l]=D(i,D(D(T,W),y))
p=input()
while(p!="q"):
 S=p.split()
 if S[1]=='n':print("(n-("+str(X[S[0]].item(0))+"))/"+str(X[S[0]].item(1)))
 else:print(str((int(S[1])-X[S[0]].item(0))/X[S[0]].item(1)))
 p=input()

Results

I might have done something wrong at some point; I get different results than the Mathematica answer:

python3 808 -> 765.19
python3 n   -> (n-(32.41))/1.01

c++ n        -> (n-(71.86))/1.17
cjam n       -> (n-(-14.09))/0.51
java n       -> (n-(18.08))/1.64
pyth n       -> (n-(1.42))/0.28
jelly n      -> (n-(-4.88))/0.34
golfscript n -> (n-(-0.31))/0.44

Yodle

Posted 2016-02-04T20:02:24.983

Reputation: 2 378