41
6
Given an input of a string consisting of any message from our site chatroom taken from the list described and linked below, output either a truthy or a falsy value attempting to predict whether that message was starred or not in 50 bytes or less.
You may use any truthy or falsy values, but they must be identical (i.e. there should only be two possible outputs, one truthy and one falsy). The input will be given as raw HTML with newlines removed, and it may contain non-ASCII Unicode characters. If you require input in something other than UTF-8, please say so in your answer.
The winning submission to this challenge will be the one that predicts the highest percentage of chat messages correctly, out of the list linked below. If two given submissions have the same success rate, the shorter submission will win.
Please provide instructions for running your code on the entire set of messages
and calculating the percentage correct. Ideally, this should be a bit of
boilerplate code (not counted towards your 50 bytes) that loops through the
positive test cases and outputs how many of them your code got correct and then
does the same for the negative test cases. (The overall score can then be
calculated manually via (correctPositive + correctNegative) / totalMessages
.)
So that your code is reasonably testable, it must complete in 5 minutes or less for the entire list of chat messages on reasonable modern-day hardware.
The full list of chat messages can be found here, and it consists of the 1000 latest starred messages as truthy test cases and the 1000 latest unstarred messages as falsy test cases. Note that there are two files in the gist; scroll about halfway down for the unstarred messages.
4Knowing the behaviors of chat, I think the following Pyth would suffice:
O2
– Arcturus – 2016-01-17T22:18:10.8079Considering the history of past starred messages, Regex, 11 bytes:
Don'?t star
– Downgoat – 2016-01-17T22:26:18.753What kind of encoding is used in the test files (utf-8 right?)? Whatever I try, I always get more than 1000 lines. Is there something else I'm doing wrong? – flawr – 2016-01-17T22:32:44.573
Ok I think I get it: When there are messages with code (those with
<pre>
and<br>
) all those linebreaks within are carriage returns\r
and not newlines. But when we download these files, it seems there will be\n
s inserted. Suggestion @Doorknob : Remove the\r
s from your testfiles. – flawr – 2016-01-17T22:46:28.52711This would be much easier if you were also given the user as part of the input. – Mama Fun Roll – 2016-01-18T01:07:43.077
And whether I was online at that time? – None – 2016-01-18T03:14:22.617
3At some point I would've answered Regex, 2 bytes
\^
– PurkkaKoodari – 2016-01-18T08:46:17.4872The title should have been something like: "Don't star this challenge" – James – 2016-01-18T16:37:20.830
1+1 for good challenge, -1 for "Do X in Y bytes". Solid sidevote here. – Mego – 2016-01-18T17:09:47.080
14I think you should run this again on the next 1,000 messages, and see which one really predicted starredness – abligh – 2016-01-18T23:03:15.597
@mego i usually would agree with this but it seems like the best way to do [tag:test-battery] – undergroundmonorail – 2016-01-19T00:31:45.857
Random question: how did you compile these lists? – ETHproductions – 2016-01-19T19:54:42.230
by what starring is defined? and how to access it? BTW -1 for not a real coding but more a "know your api" challange. Since I as embedded developer have not even a chance to create a body that would inspect the input in under 50 bytes (C and C++ here...) – Zaibis – 2016-01-20T11:24:05.363
@Zaibis See the last paragraph. Why can't you use C? There's only a few bytes of boilerplate (
int f(char*s){}
). – Doorknob – 2016-01-20T11:45:10.370@ETHproductions I wrote a short Ruby script to scrape the transcript URLs. – Doorknob – 2016-01-20T11:45:45.110