35
4
As we all know, meta is overflowing with complaints about scoring code-golf between languages (yes, each word is a seperate link, and these may be just the tip of the iceberg).
With so much jealousy towards those who actually bothered to look up the Pyth documentation, I thought it would be nice to have a little bit more of a constructive challenge, befitting of a website that specializes in code challenges.
The challenge is rather straightforward. As input, we have the language name and byte count. You can take those as function inputs, stdin
or your languages default input method.
As output, we have a corrected byte count, i.e., your score with the handicap applied. Respectively, the output should be the function output, stdout
or your languages default output method. Output will be rounded to integers, because we love tiebreakers.
Using the most ugly, hacked together query (link - feel free to clean it up), I have managed to create a dataset (zip with .xslx, .ods and .csv) that contains a snapshot of all answers to code-golf questions. You can use this file (and assume it to be available to your program, e.g., it's in the same folder) or convert this file to another conventional format (.xls
, .mat
, .sav
etc - but it may only contain the original data!). The name should remain QueryResults.ext
with ext
the extension of choice.
Now for the specifics. For each language, there is a Boilerplate B
and Verbosity V
parameters. Together, they can be used to create a linear model of the language. Let n
be the actual number of bytes, and c
be the corrected score. Using a simple model n=Vc+B
, we get for the corrected score:
n-B
c = ---
V
Simple enough, right? Now, for determining V
and B
. As you might expect, we're going to do some linear regression, or more precise, a least squares weighted linear regression. I'm not going to explain the details on that - if you're not sure how to do that, Wikipedia is your friend, or if you're lucky, your language's documentation.
The data will be as follows. Each data point will be the byte count n
and the question's average bytecount c
. To account for votes, the points will be weighted, by their number of votes plus one (to account for 0 votes), let's call that v
. Answers with negative votes should be discarded. In simple terms, an answer with 1 vote should count the same as two answers with 0 votes.
This data is then fitted into the aforementioned model n=Vc+B
using weighted linear regression.
For example, given the data for a given language
n1=20, c1=8.2, v1=1
n2=25, c2=10.3, v2=2
n3=15, c3=5.7, v3=5
Now, we compose the relevant matrices and vectors A
, y
and W
, with our parameters in the vector
[1 c1] [n1] [1 0 0] x=[B]
A=[1 c2] y=[n2] W=[0 2 0], [V]
[1 c3] [n3] [0 0 5]
we solve the matrix equation (with '
denoting the transpose)
A'WAx=A'Wy
for x
(and consequently, we get our B
and V
parameter).
Your score will be the output of your program, when given your own language name and bytecount. So yes, this time even Java and C++ users can win!
WARNING: The query generates a dataset with a lot of invalid rows due to people using 'cool' header formatting and people tagging their code-challenge questions as code-golf. The download I provided has most of the outliers removed. Do NOT use the CSV provided with the query.
Happy coding!
3
s/look up the Pyth documentation/carefully study the two existing pieces of Jelly documentation
– lirtosiast – 2016-02-04T20:09:32.317Your query doesn't seem to distinguish between Perl 5 and Perl 6. Which is similar to not distinguishing C++ from Haskell. – Brad Gilbert b2gills – 2016-02-04T21:47:04.143
@BradGilbertb2gills I know - it does a whole lot of quirky things, mostly due to people going crazy with formatting. Feel free to improve on it, but right now, it's a trade-off between a lack of version numbering and languages called
C++ <s>6 bytes</s>
. Besides, I never did any T-SQL before today and I'm already impressed with myself that I managed to extract the bytecount. – Sanchises – 2016-02-04T21:52:02.377Can we remove outliers, ie any languages with only one entry (usually incorrect language names) or the ones that have >10,000 bytes? – Robert Fraser – 2016-02-05T09:08:23.757
@RobertFraser I thought that would be too much for a single challenge. I'll fix the data file, see edit. – Sanchises – 2016-02-05T09:22:12.763
This is a neat little challenge. If it makes languages I know start scoring more competitively, I might do some golfing ;D – Draco18s no longer trusts SE – 2016-02-05T16:27:43.843
Can you provide a comprehensive set of test cases for this challenge? Otherwise, something like
input(); print -1000000
cannot be distinguished from a valid answer. – quintopia – 2016-02-06T06:59:33.717@quintopia The input/output is deterministic, so if your values differ from the values produced by the competitor(s), either of you must have a mistake. I believe that I am just as likely to make a mistake as you, so reference values produced by me are no more valuable than those produced by your competitor(s). – Sanchises – 2016-02-06T21:09:27.147
@Draco18s You can check out the current Mathematica answer and see if any of the languages njpipeorgan has checked give you an advantage. – Sanchises – 2016-02-06T21:12:07.623
"In simple terms, an answer with 0 votes should count the same as two answers with 1 vote." Flip this, right? – Leif Willerts – 2016-02-15T14:10:33.853
@LeifWillerts Yes. :) – Sanchises – 2016-02-15T14:32:48.587
I have an idea that built-ins could be counted as
(# built-ins)/256
bytes. Then languages without one byte functions could be "fairly" scored as having one byte functions. I've played with trying to detect most-used words for each language in mathematica, but am having trouble with the query and extracting answers. If anyone is willing to help, let's open a community wiki! – Michael Klein – 2016-02-16T20:22:38.427Mathematica, 0:
0
...Please tell me this shouldn't work. – CalculatorFeline – 2016-02-23T02:30:55.573