26
2
Create a program that find the latest 50 challenges with the code-golf-tag that have at least 20 answers. Then, extract the scores for each language in each of the challenges. If there are more than one answer using the same language, count all scores. Thereafter, take the top 20 most common languages and output a list with the language names, the number of answers, the average byte counts and the median byte counts. The list should be sorted by number of answers, in descending order.
You must account for variations in capitalization (for instance: Matlab = MATLAB).
In languages with many different version numbers (e.g. Python), count them as unique languages, so: Python != Python 2 != Python 2.7 != Python 3.x
Example output (output format is optional):
cJam, 66, 12.4, 8.5
Pyth, 58, 15.2, 19
Ruby, 44, 19.2, 22.5
Python, 34, 29.3, 32
Python 2.7, 22, 31.2, 40
...
...
Java, 11, 115.5, 94.5
Header formats that must be supported:
- Starts with
# Language name,
or#Language name
- Ends with
xx bytes
,xx Bytes
or justxx
- There can be a lot of garbage between the first comma and the last number.
- If the language name is a link (
[Name](link)
), it can be skipped
If the answer has another header format, you may choose to skip it (or include it if your code can handle it).
As an example, all of the below headers must be supported:
# Language Name, N bytes
# Ruby, <s>104</s> <s>101</s> 96 bytes
# Perl, 43 + 2 (-p flag) = 45 Bytes
# MATLAB, 5
Rules:
- It's OK to use API or just the website-url
- The following can be extracted from the byte count (nothing else), so no need to use a url-shortener (Maximum 44 bytes):
https://
(orhttp://
)codegolf
.stackexchange.com
/questions
- The following can be extracted from the byte count (nothing else), so no need to use a url-shortener (Maximum 44 bytes):
- The program can take input. The input will be included in the byte count.
Other than that, standard rules apply.
11I could tell you it's Pyth without having to do this challenge at all. – Alex A. – 2015-10-27T20:43:17.370
1is the " bytes" suffix common, let alone universal, enough to require it? – Sparr – 2015-10-27T20:54:15.177
@StewieGriffin I think Sparr is saying that, while it is common, it's not always used. – Celeo – 2015-10-27T21:00:07.603
As far as I can see,
xx bytes
is very common on recent challenges (at least since the leaderboard snippet was created). – Stewie Griffin – 2015-10-27T21:02:54.630I've seen many cases where the user omits the comma, and even a few times,
103 <s>108</s> <s>110</s> bytes
is used instead of left-to-right. Do we need to support these? – ETHproductions – 2015-10-27T21:08:04.770@ETHproductions,
103 <s>108</s>
doesn't have to be counted. I couldn't find a "rule" for cases where the comma was omitted because cases likePython 2 3 + 12 = 14
could be hard to handle. I didn't want to specify 100 different formats that must be supported, since that's just too cumbersome and there are a lot of corner cases (for instance score in parentheses,Pyth (5)
). So no, you don't have to support headers without the comma (but you can if you want to). – Stewie Griffin – 2015-10-27T21:15:17.2602I usually use "chars" or "characters" instead of "bytes" – Doorknob – 2015-10-27T21:38:15.533
Define latest: Is it creation date? Last activity? – pppery – 2015-10-31T01:59:13.943
Cretion date... – Stewie Griffin – 2015-10-31T09:48:59.580
Is it necessary to preserve the capitalization of the language name? If so, which one should I preserve? – pppery – 2015-10-31T14:17:06.910
@AlexA. You're wrong. Again. It's APL. :P – mbomb007 – 2015-11-03T22:05:05.190