Was it Really Canada Day?

22

3

July 1st is Canada day (yay Canada)! Or is it? It seems that the Wikipedia page for this day has a lot of Canada related content, but is there another day which is more Canadian?

Your task is to write a program or function which takes a date (month and day) as input and returns or outputs the number of mentions of "Canada" on the Wikipedia page for the inputed date. Some rules:

  • Dates may be input in any reasonable format of your choice
  • Your submission must pull data from the url en.wikipedia.org/wiki/Month_Day.
  • Only "Canada" must be searched for and counted included substrings, and only in title case. "Canadian" does not count, however "Canada's" does count. As long as the exact, case-senstitive text "Canada" exists within a string, it is a match
  • Contents of the page are considered anything within the corresponding .html file (i.e. what shows up if you download as page as a .html and open it in Notepad)
  • Result may be output to STDOUT, returned, or displayed in any other reasonable manner

Test Cases:

July 1 => 34
May 14 => 1
Oct 31 => 2
July 4 => 2

This is code golf, so shortest submission wins

(As an unrewarded bonus, I'm interested to see what the day with the highest count is)

wnnmaw

Posted 2016-07-05T17:05:41.883

Reputation: 1 618

Can the Wikipedia API be used? – LegionMammal978 – 2016-07-05T17:14:19.540

I don't know much about it, so I'm hesitant to say yes in case there is a trivial function to it. Use your best judgement and if it makes it too easy please abstain – wnnmaw – 2016-07-05T17:16:26.030

9

So references to Canadaville, Canadair, Canadarm, Canadaga, Canadarago, Canaday, Canadaspis, et al. count?

– msh210 – 2016-07-05T17:21:56.217

@msh210, Yep, that they do – wnnmaw – 2016-07-05T17:23:12.670

Everyone is using enwp.org in here – None – 2016-07-06T06:23:20.453

1July 1 is the day with the highest count! Wrote a quick program for it, though it isn't golfed. – Andrew – 2016-07-06T12:10:17.627

@MatthewRoh, yes, but that redirects to the required url, so I'm allowing it – wnnmaw – 2016-07-06T13:42:51.850

Answers

4

Pyth, 31 bytes

/jk'+"http://enwp.org/"z"Canada

Does not work on the online implementation, the server disables Internet access. I wanted to use http://wki.pe/July_1 but sadly it's a client-side redirect so it fetches the wrong page. The input format is July_1.

The code is basically just:

"".join(open("http://enwp.org/"+input())).count("Canada")

busukxuan

Posted 2016-07-05T17:05:41.883

Reputation: 2 728

24

Bash, 43 42 40 bytes

curl -L enwp.org/$@|grep -o Canada|wc -l

Uses curl, grep, and wc to count occurrences of "Canada" in specified webpage. Like the other answers, input is given in the format July_1. This is my first time posting on the Code Golf SE and I'm not quite familiar with all of the rules. Any feedback would be most welcome.

Didn't realize that output to STDERR is traditionally ignored. Thanks for the 3 bytes, Dennis!

Sriram

Posted 2016-07-05T17:05:41.883

Reputation: 341

But wouldn't curl -sL still be shorter than wget -qO-? – Nick Matteo – 2016-07-07T02:07:14.227

1Output to STDERR is ignored by default, so you can use curl without -s (or wget without -q). – Dennis – 2016-07-07T02:41:38.533

@Dennis Thanks! I didn't know that STDERR is ignored. Much appreciated. – Sriram – 2016-07-07T04:33:05.917

@kundor That's a good point. For some reason, combining the two flags never occurred to me. Still, since output to STDERR is ignored by default, it'd be shorter to omit the -s entirely. – Sriram – 2016-07-07T04:35:31.257

15

Perl 5, 39 bytes

38 bytes, plus 1 for -pe instead of -e

$_=()=`curl -L enwp.org/$_`=~/Canada/g

Takes input like July_1.

Thanks to busukxuan for saving me seven bytes.

msh210

Posted 2016-07-05T17:05:41.883

Reputation: 3 094

1

I'm not familiar with curl, but is it possible to save the six bytes of "http://%22?

– busukxuan – 2016-07-05T18:36:56.430

1@busukxuan, yep, many thanks. – msh210 – 2016-07-05T19:16:08.003

7

Python 3.5, 117 111 98 90 bytes

(-8 bytes (98 -> 90) thanks to alexwlchan)

from urllib.request import*
lambda i:urlopen('http://enwp.org/'+i).read().count(b"Canada")

Simply uses Python's built-in "urllib" library to fetch HTML data and then counts the occurrences of the word "Canada" in that data. Will try and golf more over time where and when I can. Call it by renaming the lambda function to anything and then calling that name like a normal function wrapped in print(). For instance, if the function were named H, then you would call it like print(H(Month_Day)).

R. Kap

Posted 2016-07-05T17:05:41.883

Reputation: 4 730

4I think you can save eight characters by replacing .decode().count("Canada") with .count(b"Canada"). – alexwlchan – 2016-07-05T18:23:04.997

@alexwlchan Yes, you are right. Thanks! :) – R. Kap – 2016-07-05T18:33:49.723

Surely this would be shorter in Python 2, since the urllib.urlopen function isn't in a subpackage (from urllib import* versus from urllib.request import*), and the b"Canada" could be replaced with "Canada" since Python 2's strings are bytes by default. I count 81 bytes in Python 2, and it works according to my testing.

– Mego – 2016-07-05T21:19:53.357

5

Mathematica, 60 bytes

Import["http://enwp.org/"<>#,"Source"]~StringCount~"Canada"&

Anonymous function. Similarly to the Perl 5 solution, takes input like July_1.

LegionMammal978

Posted 2016-07-05T17:05:41.883

Reputation: 15 731

Just to close the loop, this use of the API is totally fine – wnnmaw – 2016-07-05T18:54:05.960

5

C#, 85 bytes

return Regex.Matches(new WebClient().DownloadString("http://enwp.org/"+d),"Canada").Count;

Takes input d like July_1.

And July_1 is truly Canada Day, having the most references. With February_1 and April_23 sharing 2nd place with 18 "Canada"s each.

Find "Canada" day (in parallel), 207 bytes:

return Enumerable.Range(0,366).Select(i=>new DateTime(8,1,1).AddDays(i).ToString("MMMM_d")).AsParallel().OrderBy(d=>Regex.Matches(new WebClient().DownloadString("http://enwp.org/"+d),"Canada").Count).Last();

(Year 8 is the leap year with the shortest representation). Potentially inefficient, in that the OrderBy probably generates >366 web calls, but just going for shorter and appears to complete in not much more time.

weston

Posted 2016-07-05T17:05:41.883

Reputation: 371

5

PowerShell, 52 bytes

((iwr enwp.org/$($args[0]))-csplit"Canada").length-1
  • Input as July_1.
  • iwr is short for Invoke-WebRequest.
  • $($args[0]) is first command line argument. Start script as OhCanada.ps1 July_1.
  • -csplit is case sensitive split.

Kobi

Posted 2016-07-05T17:05:41.883

Reputation: 728

4

R, 99 96 bytes

x=function(d){p=readLines(paste0("http://enwp.org/",d));sum(nchar(p)-nchar(gsub("Canada","",p)))/6}

d=scan(,"");p=readLines(paste0("http://enwp.org/",d));sum(nchar(p)-nchar(gsub("Canada","",p)))/6

This takes input d in the form "July_1" and returns the count of Canadas. It counts the words by counting the number of characters on the page, then removes the word Canada from the page and counts the characters again. The number of times Canada shows up is the difference in these counts divided by the number of letters in Canada, 6.

edit: I appreciate the tip below about replacing my function with scan.

Austin

Posted 2016-07-05T17:05:41.883

Reputation: 41

I think you can drop the x=function(d){ and replace with d=scan(,'') making it program instead of function and saving some bytes. – pajonk – 2016-07-06T17:58:40.697

Thanks! That saved three bytes. I haven't used scan before. – Austin – 2016-07-06T20:01:16.433

4

ES6, 89 bytes

d=>fetch('http://enwp.org/'+d).then(r=>r.text().then(t=>alert(t.split`Canada`.length-1)))

Sadly Unwrapping all the promises penalises the size :/

YardGlassOfCode

Posted 2016-07-05T17:05:41.883

Reputation: 41

Nice answer, welcome to the site! – James – 2016-07-06T00:08:08.383

1Couple comments. You can apply the same "input is in the format July_1" trick as the rest of the questions to save a few bytes. You also have an error using split().length(), which will give you a response greater than the goal. – IvanSanchez – 2016-07-06T09:00:03.093

Agree with @IvanSanchez on the input format and needing a -1 after the .length, but you can save some bytes by omitting the https: part of the URL, and use split'Canada' (but with backticks!) instead of split('Canada') to save a couple more! – Dom Hastings – 2016-07-06T11:39:16.320

Wow had no idea about backticks! I have made the changes mentioned. – YardGlassOfCode – 2016-07-07T01:40:31.303

Firefox allows you to drop the // after http. – user2428118 – 2016-07-07T08:50:50.323

3

Ruby + curl, 44 bytes

p`curl -L enwp.org/#$_`.scan(/Canada/).size

ruby -n + 43 bytes. Takes input like July_1.

Lynn

Posted 2016-07-05T17:05:41.883

Reputation: 55 648

2

Clojure, 71 bytes

#(count(re-seq #"Canada"(slurp(str"https://en.wikipedia.org/wiki/"%))))

Yeah, it would be nice to use http://enwp.org but I guess slurp does not handle redirects(?). Anonymous function which take day in the format "July_1".

cliffroot

Posted 2016-07-05T17:05:41.883

Reputation: 1 080

2

PHP, 65 bytes

echo substr_count(file_get_contents('http://enwp.org'),'Canada');

MonkeyZeus

Posted 2016-07-05T17:05:41.883

Reputation: 461