71
23
Backstory:
You enjoy your new programming job at a mega-multi-corporation. However, you aren't allowed to browse the web since your computer only has a CLI. They also run sweeps of all employees' hard drives, so you can't simply download a large CLI web browser. You decide to make a simple textual browser that is as small as possible so you can memorize it and type it into a temporary file every day.
Challenge:
Your task is to create a golfed web browser within a command-line interface. It should:
- Take a single URL in via args or stdin
- Split the
directory
andhost
components of the URL - Send a simple HTTP request to the
host
to request the saiddirectory
- Print the contents of any
<p>
paragraph</p>
tags - And either exit or ask for another page
More Info:
A simple HTTP request looks like this:
GET {{path}} HTTP/1.1
Host: {{host}}
Connection: close
\n\n
Ending newlines emphasized.
A typical response looks like:
HTTP/1.1 200 OK\n
<some headers separated by newlines>
\n\n
<html>
....rest of page
Rules:
- It only needs to work on port 80 (no SSL needed)
- You may not use netcat
- Whatever programming language is used, only low-level TCP APIs are allowed (except netcat)
- You may not use GUI, remember, it's a CLI
- You may not use HTML parsers, except builtin ones (BeautifulSoup is not a builtin)
- Bonus!! If your program loops back and asks for another URL instead of exiting, -40 chars (as long as you don't use recursion)
- No third-party programs. Remember, you can't install anything.
- code-golf, so the shortest byte count wins
7Python,
import webbrowser;webbrowser.open(url)
– Blue – 2015-10-26T16:36:06.6738@muddyfish read the rules – TheDoctor – 2015-10-26T16:36:33.080
1Another fuzzy point is the request itself. Some websites will accept incomplete or non-standard requests. I suggest you include an example request (to e.g. example.com) and the expected output. – mınxomaτ – 2015-10-26T16:41:09.380
4Can you provide a sample web page of some sort for testing this? It is difficult to find places that use <p> :P – a spaghetto – 2015-10-26T16:43:19.640
@quartata try Wikipedia – TheDoctor – 2015-10-26T16:43:43.177
@quartata example.com would be perfect. It is guaranteed to never change its content and is relatively small. – mınxomaτ – 2015-10-26T16:44:05.660
@minxomat sure, when I get back on my computer – TheDoctor – 2015-10-26T16:44:06.023
@minxomat it shouldn't be too hard. I'll clarify the request soon – TheDoctor – 2015-10-26T16:49:01.027
@TheDoctor Nevermind, that was a dumb question that you've already answered ... – mınxomaτ – 2015-10-26T16:50:18.667
52
Are we allowed to parse HTML using regex? ;-)
– Digital Trauma – 2015-10-26T16:57:38.5333The restriction to *low-level socket interfaces* seems to prohibit the TCP-level APIs of most languages which have TCP-level APIs. – Peter Taylor – 2015-10-26T16:58:23.123
Related: http://codegolf.stackexchange.com/questions/44278/debunking-stroustrups-debunking-of-the-myth-c-is-for-large-complicated-pro
– Digital Trauma – 2015-10-26T16:58:25.253@DigitalTrauma good luck – TheDoctor – 2015-10-26T16:58:30.777
@PeterTaylor I intended that to mean only a simple TCP API was allowed... Clarification soon – TheDoctor – 2015-10-26T17:00:59.110
3Wouldn't all headings
h1 … h6
be important, too? If you actually aren't allowed to read you may need to hurry and rush through the content. – insertusernamehere – 2015-10-26T17:02:27.297Do the contents of each
<p>...</p>
need to be printed on separate lines or can the output be dumped all in one log line? – Digital Trauma – 2015-10-26T18:29:27.9231@DigitalTrauma it should be newline separated – TheDoctor – 2015-10-26T19:52:26.883
1Is HTTP 1.1 mandatory? A simple HTTP request is even simpler: "GET $path HTTP/0.9\r\n\r\n" – slebetman – 2015-10-27T03:05:48.373
Is IO:Socket::INET considered low-level enough?
– Dom Hastings – 2015-10-27T06:32:41.500The second and fourth lines do the same thing: replace every gap in the succeeding line with the appropriate character. They jump to the next line at the end. – user15308 – 2015-10-27T05:52:02.383
1@DomHastings Seems as low as the Bash and PHP entries. Open a socket, write and read. – Schwern – 2015-10-27T08:02:18.470
2Totally off-topic: any mega-multi-corporation that makes it this hard for their developers to access the internet is not worth working for IMHO. As a developer, I need Google and Stackoverflow on a daily, sometimes even hourly basis to search for solutions. Not having access to these essential tools is like not giving a commercial pilot access to his radio. – Nzall – 2015-10-27T16:40:18.760
Just download it again. Every day. – None – 2015-10-27T21:56:03.227
1Nitpicking: the newlines in an HTTP request are actually
\r\n
. – Josiah Keller – 2015-10-28T13:50:21.843If the CLI is
bash
,wget
might be preinstalled. – Cees Timmerman – 2015-10-28T16:40:16.000@CeesTimmerman wget isn't a socket API – TheDoctor – 2015-10-28T16:43:13.307
@TheDoctor So the bold line wouldn't apply to it, hence the no install rule, which also doesn't apply if it's pre-installed. – Cees Timmerman – 2015-10-28T17:03:12.307
@CeesTimmerman But wget handles all the HTTP request internally, which isn't allowed. – TheDoctor – 2015-10-28T17:21:17.587
What's this "(as long as you don't use recursion)" about? – Bergi – 2015-10-29T01:03:19.967
@Bergi Using recursion would eventually cause a Stack Overflow given enough browsing. – TheDoctor – 2015-10-29T01:36:21.873
2@TheDoctor: That would depend on the language and its ability of tail call optimisation. A recursive approach is totally standard in Haskell or JS – Bergi – 2015-10-29T09:32:06.167
small in what sense?? code lines or executable size?? – Ehsan Sajjad – 2015-10-29T11:18:35.813
So small that you can remember it. Which is hard. Some people can remember pages of code, but I would have trouble remembering just
wget
,less
andgrep
to perform this task, even though they let you build a full fledged browser in under 10 lines. – GolezTrol – 2015-10-30T06:32:50.983