Graduation Meme Tally

3

Now that we're graduating, it's time to tally up the number of times someone suggested that PPCG was graduating, even before the annoucement! (see here)

Your program will receive a list of chat messages. Each message has three pieces of data: the user (e.g. Geobits), the timestamp (e.g. Aug 19 '15 2:41 PM), and the message (e.g. What do you mean "fake"? It clearly says we're graduating in September.). The list will be in no particular order.

The timestamp will be presented in the format that the StackExchange chat search uses:

  • If the date is in the current year, the date will be of the format: %b %-d %-I:%M %p

    • %b = three-character month (one of Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec)
    • %-d = day of month, no padding
    • %-I = hour (12-hour clock), no padding
    • %M = minute, padded to two characters with zero
    • %p = AM or PM

    For example, you might have Feb 13 8:09 PM or Mar 8 1:32 AM.

  • If the date is in another year, the date will be of the format: %b %-d '%y %-I:%M %p

    • Same as above, with %y = two digit year, padded to two characters with zero

    For example, you might have Aug 29 '13 12:02 PM or May 11 '15 11:11 AM.

All dates are in UTC.

You may take input in whatever format you wish (array parameter, string delimited with newline, etc.) as long as the date is still in the format listed above. You may take the current year as an input if your language does not provide any mechanism for fetching the year.

A message is considered to talk about PPCG's graduation if it contains at least one of: graduate, graduated, graduating, or graduation, and contains either we or PPCG. All matches should be case-insensitive, and should not be part of another word: extra letters are not allowed either directly before or after the match (so wegraduate is invalid). Note that we graduate., 3graduate we, and Graduation PPCG! are all valid matches.

Graduation was announced on February 23, 2016 at 16:54:15 UTC. Therefore, any message sent after this date (starting with Feb 23 4:55 PM) should be excluded.

Furthermore, there are some messages that suggest that PPCG is not graduating. These, of course, should be excluded. Therefore, any message containing an odd number of the following should be excluded: not, never, won't, aren't, wont, arent. Once again, these must not be part of another word (see above) and matches should be case-insensitive.

Your program should output a list of all the users with at least one message suggesting our graduation ahead of time, as well as the number of such messages for each user. This list should be sorted by count in descending order. In the case of multiple users with the same number of messages, any of the users may appear first.

Example

Input:
Lines starting with // are not part of the input, and are only present for explanatory purposes.

// does not count, missing we or PPCG
Geobits
Sep 26 '14 4:44 PM
If that's the justification for keeping it beta, then I assume it will graduate as soon as Howard hits 20k?
// counts
Downgoat
Feb 13 8:09 PM
Also, now that we've reached 10Q/D are we graduating?
// does not count, has one not
Geobits
Sep 26 '14 5:10 PM
Yea, sorry. We're not graduating. I'm going to assume from now on that Oct 1 is the southern hemisphere's April Fool's Day.
// counts
Rainbolt
Mar 31 '15 2:36 PM
It also works when we graduate
Alex A.
// does not count, missing we or PPCG
Jan 12 9:01 PM
Watch lies.se graduate before us
// counts
Geobits
Sep 15 '15 5:59 PM
When we graduate in November.
// counts
Geobits
Feb 19 3:41 AM
Maybe we'll graduate tonight and it'll jump to 3k.
// counts
PhiNotPi
Oct 2 '14 1:26 PM
I heard some weird rumor that we are graduating soon. Is there any truth to that?
// does not count, posted after annoucement
Geobits
Feb 23 6:40 PM
I guess we have to graduate this one now:
// does not count, missing we or PPCG
Runer112
Mar 31 '15 4:01 PM
rainbolt, flag at start of each graduate?
// counts
Duck
Nov 11 '14 11:05 AM
We aren't not graduating soon.
// counts
Geobits
Mar 3 '15 2:32 PM
That's why we don't have a logo yet (until we graduate in May). We don't trust the internet.
// counts (not is part of cannot, and is not considered)
user-1
Apr 1 '14 10:05 PM
We cannot graduate.

Output:
Format may vary. You may use any delimiter, or return an array, etc.

Geobits - 3
Downgoat - 1
Rainbolt - 1
PhiNotPi - 1
Duck - 1
user-1 - 1

es1024

Posted 2016-03-08T23:36:11.247

Reputation: 8 953

So the matching of won't, never, ... is case-sensitive? – Denker – 2016-03-09T07:53:43.380

@DenkerAffe it's case-insensitive, added to main question – es1024 – 2016-03-09T08:03:33.993

Answers

2

Python 2, 569 491 bytes

def g(a):t=filter(lambda m:m[1]<datetime(2016,2,23,4,55)and search(r"\bgraduat(e|ing|ion)\b",m[2],2)and search(r"\bwe|ppcg\b",m[2],2)and len(findall(r"\b(never|(no|won'|aren'|won|aren)t)\b",m[2],2))%2-1,[[m[0],datetime.strptime(m[1],"%b %d '%y %H:%M %p"),m[2]]for m in [[m[0],sub(r"(\w{3} \d\d+)(.*)",r"\1 '%s\2"%str(datetime.now().year)[2:],m[1])if len(m[1])<17else m[1],m[2]]for m in a]]);print sorted([[u,len(filter(lambda m:m[0]==u,t))]for u in set([m[0]for m in t])],key=lambda u:-u[1])

Ungolfed version and demonstration here!

Gonna try to golf this down laterGolfed it down a bit, especially the regex part (I gotta get this into one regex) but I am too noob to optimize the regexes :(

Denker

Posted 2016-03-08T23:36:11.247

Reputation: 6 639

\b(never|(no|((wo|are)n'?))t)\b saves 5 bytes, I think – Joe – 2017-10-13T17:47:56.930