Code Golf : Parsing google results

16

2

When you search something on google, within the results page, the user can see green links, for the first page of results.

In the shortest form possible, in bytes, using any language, display those links to stdout in the form of a list. Here is an example, for the first results of the stack exchange query :

A screen capture

Input :

you choose : the URL (www.google.com/search?q=stackexchange&ie=utf-8&oe=utf-8) or just stackexchange

Output :

french.stackexchange.com/, stackoverflow.com/, fr.wikipedia.org/wiki/Stack_Exchange_Network, en.wikipedia.org/wiki/Stack_Exchange,...

Rules :

  • You may use URL shorteners or other search tools/APIs as long as the results would be the same as searching https://www.google.com.

  • It's ok if your program has side effects like opening a web browser so the cryptic Google html/js pages can be read as they are rendered.

  • You may use browser plugins, userscripts...

  • If you can't use stdout, print it to the screen with, eg. a popup or javascript alert !

  • You don't need the ending / or the starting http(s)://

  • You should not show any other link

  • Shortest code wins !

  • Good luck !

EDIT : This golf ends the 07/08/15.

WayToDoor

Posted 2015-07-31T10:16:01.260

Reputation: 459

Since you are using google.fr, do we have to use that as well? – Beta Decay – 2015-07-31T12:31:51.107

You can use any google you want. I'm french, so I used .fr, but you could use .com or .anything :) Dosn't matter – WayToDoor – 2015-07-31T12:37:37.843

And shortened URLs such as gogle.de are fine as well?

– Beta Decay – 2015-07-31T12:40:22.760

You may use URL shorteners or other search tools/APIs as long as the results would be the same as searching https://www.google.com, so yes

– WayToDoor – 2015-07-31T12:41:53.793

I've made some minor formatting changes - feel free to reverse any that are not how you want. – trichoplax – 2015-07-31T12:49:26.030

6

In case you're tempted: remember you can't parse HTML with regex

– Luis Mendo – 2015-07-31T15:16:01.040

I can't find an online Python interpreter that will let me use urllib or anything web related. – mbomb007 – 2015-07-31T16:55:12.263

@mbomb007 I'm pretty sure IdeOne supports urllib – Beta Decay – 2015-08-02T16:02:40.700

The method I found for using urllib required import json as well. You seem to have done it without that. – mbomb007 – 2015-08-03T18:44:36.203

can we use google.search API libraries? – cat – 2016-04-26T00:02:05.010

Yup :) (But, by the way, did you see the date of the challenge ? It's cool, but, a few months old :p) – WayToDoor – 2016-04-26T07:52:20.527

Answers

17

Bash + grep + lynx, 38

Since we can open a web browser, then I will use lynx:

lynx -dump $1|grep -Po '(?<=d:)[^&]+'

(Thanks to @manatwork for grep usage instead of sed)

We pass in the whole URL in as a parameter:

$ ./gr.sh "www.google.com/search?q=stackexchange&ie=utf-8&oe=utf-8"
http://stackexchange.com/
https://en.wikipedia.org/wiki/Stack_Exchange
https://twitter.com/stackexchange
https://play.google.com/store/apps/details?id=com.stackexchange.marvin
https://github.com/StackExchange/StackExchange.Redis
https://github.com/StackExchange/StackExchange.Redis/blob/master/Docs/Basics.md
https://www.crunchbase.com/organization/stack-exchange
$ 

Which gives the same list as:

enter image description here

Digital Trauma

Posted 2015-07-31T10:16:01.260

Reputation: 64 644

Well that's handy :D – Beta Decay – 2015-07-31T16:35:19.697

3sed good. sed long. Try GNU grep: grep -Po '(?<=d:)[^&]+' – manatwork – 2015-07-31T16:35:27.643

@manatwork Yes, of course - thanks! – Digital Trauma – 2015-07-31T16:40:32.913

1Was the answer title copypasted? ;) None of bash, lynx or sed (and now grep) is part of coreutils. – manatwork – 2015-07-31T16:42:06.690

@manatwork I was not implying that bash or lynx were part of coreutils - thats why they are listed separately. For some reason I had the mistaken belief that grep and sed were coreutils - I've fixed that. – Digital Trauma – 2015-07-31T19:28:16.647

3I believe you can also do: lynx -dump $1|grep -Po 'd:\K[^&]+' (untested) – Jarmex – 2015-08-01T10:13:22.663

Best answer to date ! – WayToDoor – 2015-08-07T12:59:40.150

4

Ruby, 91 77 bytes

require'open-uri';open(gets).read.scan(/ed:(.*?)\+/){|x|puts URI.decode x[0]}

Would've been shorter without all the requires. ARGH!!! EDIT: So, turns out, I don't need the second require! Thanks to @manatwork for pointing that out.

Older version (with the useless require):

require'open-uri';require 'uri';open(gets).read.scan(/ed:(.*?)\+/){|x|puts URI.decode x[0]}

kirbyfan64sos

Posted 2015-07-31T10:16:01.260

Reputation: 8 730

Rules allow the use of command line options as long as you count them too: http://pastebin.com/PnpjnXji (If you feel this is unfair style, feel free to use only the change in the code block.)

– manatwork – 2015-07-31T19:15:27.187

Are you sure you need to explicitly require'uri'? In 2.1.2 I use the URI module becomes available after requiring open-uri. – manatwork – 2015-07-31T19:24:12.687

@manatwork Thank you! Updated. – kirbyfan64sos – 2015-07-31T20:16:49.250

Just for my curiosity: any reason to not change the code block as in my pastebin alternative? (Of course, I'm curious about technical reasons, not personal reasons, if that hold you back.) – manatwork – 2015-08-01T10:30:25.287

@manatwork I need to, but I was too lazy to figure out the byte count at the moment. :) – kirbyfan64sos – 2015-08-01T14:56:39.443

4

Wolfram Language (Mathematica), 135

StringJoin/@(Cases[URLExecute["www.google.com/search",{"q"->#},"XMLObject"],XMLElement["cite",_,l_]:>l,-1]/.XMLElement["b",_,{s_}]:>s)&

more readable:

StringJoin/@(Cases[
    URLExecute["www.google.com/search",{"q"->#},"XMLObject"], 
    XMLElement["cite",_,l_]:>l,-1] /. 
    XMLElement["b",_,{s_}]:>s)

chuy

Posted 2015-07-31T10:16:01.260

Reputation: 389

Are the spaces really necessary? Without them, I get 136 bytes.

– kirbyfan64sos – 2015-07-31T20:18:12.083

not necessary at all...I really should tighten this up.. – chuy – 2015-07-31T20:20:42.620

Can you do something like this answer to shorten this?

– Digital Trauma – 2015-07-31T20:26:18.020

3

Python 3, 141 bytes

Nowhere near Digital Trauma's answer, but it was fun to work out the regex :D

import re
print('\n'.join(map(lambda x:x[3:],re.findall('te>http[s]?://\w+\.[a-z]+[](/a-z\.)?]+',__import__("requests").get(input()).text))))

For input http://www.google.com/search?q=stackexchange&ie=utf-8&oe=utf-8 the program outputs:

https://en.wikipedia.org/wiki/
https://twitter.com/
https://play.google.com/store/apps/details?id...
https://www.crunchbase.com/organization/
https://www.facebook.com/
https://github.com/

Implements grc's tip

Beta Decay

Posted 2015-07-31T10:16:01.260

Reputation: 21 478

Do you really need to use __import__? – ckjbgames – 2017-05-31T00:27:06.493

Also, use an [x for x in spam] construct instead of map. That will save you a good number of bytes. – ckjbgames – 2017-05-31T01:14:44.133

2

Factor, 31 bytes

There happens to be a library for this.

[ google-search [ url>> ] map ]

cat

Posted 2015-07-31T10:16:01.260

Reputation: 4 989