Unshorten Google Links

10

1

Challenge

Given a valid goo.gl shortened link, output the original URL.

Example

goo.gl/qIwi3N would give something like https://codegolf.stackexchange.com/. For input, you can choose to have the https?:// at the beginning, you can choose to have www. before the goo.gl, and you can also choose to get rid of the goo.gl if you only want the end of the URL. A slash at the end is optional for input and output. So, your input will end up matching the regex (https?://)?(www\.)?(goo\.gl/)?[A-Za-z0-9]+/?. For the output, you can choose whether or not to output https?://, whether or not to output www., and whether or not to output a trailing slash. However, you must be consistent with your I/O formatting.

I will say that you do not have to be consistent with https vs. http for output (but you must be consistent for input), as long as you are consistent with whether or not you include the whole https?:// part.

Test Cases

These are written with no https://, no www., no trailing slash in the input; yes https://, no www., and yes trailing slash in the output.

input -> output
goo.gl/HaE8Au -> https://codegolf.stackexchange.com/
goo.gl/IfZMwe -> https://stackoverflow.com/
goo.gl/JUdmxL -> https://chat.stackexchange.com/rooms/240/the-nineteenth-byte

Assumptions

  • You may assume that the shortened link will not point to another shortened link and that the destination site will return a status code of 2xx or 4xx (no redirections).

You can go here and enter a URL to apply the inverse operation of this: https://goo.gl/

HyperNeutrino

Posted 2017-05-13T04:13:48.713

Reputation: 26 575

@HelkaHomba fixed – Pavel – 2017-05-13T04:47:02.203

3Weather or notr to output a leading www. makes a difference. It is just in most cases the same server it refers to. Try for example http://pks.mpg.de and http://www.pks.mpg.de. The first cannot be resolved, while the latter can. – Golar Ramblar – 2017-05-13T10:46:53.507

@StephenS Done, thanks for the suggestion. – HyperNeutrino – 2017-05-13T12:21:06.073

Answers

11

CJam, 7 bytes

lg'"/5=

Test run

$ alias cjam
alias cjam='java -jar ~/.local/share/cjam-0.6.5.jar'
$ cjam unshorten.cjam <<< goo.gl/HaE8Au; echo
https://codegolf.stackexchange.com/
$ cjam unshorten.cjam <<< goo.gl/IfZMwe; echo
https://stackoverflow.com/
$ cjam unshorten.cjam <<< goo.gl/JUdmxL; echo
https://chat.stackexchange.com/rooms/240/the-nineteenth-byte

How it works

lg reads a line from STDIN and makes a GET request to that URL. The shortened URL issues a 301 redirect, which CJam doesn't follow. For the first test case, this pushes

<HTML>
<HEAD>
<TITLE>Moved Permanently</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000">
<H1>Moved Permanently</H1>
The document has moved <A HREF="https://codegolf.stackexchange.com/">here</A>.
</BODY>
</HTML>

on the stack. Finally, '"/ splits at double quotes, and 5= gets the sixth chunk. Output is implicit.

Dennis

Posted 2017-05-13T04:13:48.713

Reputation: 196 637

1I've tried 05AB1E and Pyth, they both follow the 301 :( – Erik the Outgolfer – 2017-05-13T07:04:26.900

3

Python 2 + requests, 44 bytes

from requests import*
print get(input()).url

requests.get(URL) issues a GET request to the specified URL. The response object's url field contains the final URL, after any redirects. A protocol (e.g. http://) is required for the input, and the input is expected to be in quotes.

Mego

Posted 2017-05-13T04:13:48.713

Reputation: 32 998

1requests isn't built-in, so that needs to be added to the language header. – numbermaniac – 2017-05-13T08:05:02.577

1Use a lambda expression for -3 bytes – ovs – 2017-05-13T11:56:42.447

1@numbermaniac Whoops, you're right, I get so used to requests that I forget it's a third party lib. – Mego – 2017-05-13T16:32:02.907

2

Bash, 28 24 bytes

curl -I $1|grep -oehtt.*

The output ends with a Windows-style newline, which I assume is acceptable.

Test run

$ bash unshorten.sh 2>&- goo.gl/HaE8Au
https://codegolf.stackexchange.com/
$ bash unshorten.sh 2>&- goo.gl/IfZMwe
https://stackoverflow.com/
$ bash unshorten.sh 2>&- goo.gl/JUdmxL
https://chat.stackexchange.com/rooms/240/the-nineteenth-byte

How it works

curl -I sends a HEAD request, so it fetches only the HTTP headers of the specified URL. For the first test case, it prints

HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Mon, 01 Jan 1990 00:00:00 GMT
Date: Sat, 13 May 2017 05:51:48 GMT
Location: https://codegolf.stackexchange.com/
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Transfer-Encoding: chunked
Accept-Ranges: none
Vary: Accept-Encoding

or similar. The output is piped to grep -oehtt.*, which displays only parts that match the specified regex, i.e., the string htt followed by any amount of characters up to the end of the line.

Dennis

Posted 2017-05-13T04:13:48.713

Reputation: 196 637

How does this work? – Arjun – 2017-05-13T05:43:44.967

I've added an explanation. – Dennis – 2017-05-13T05:53:39.313

Hehe, well explained! In that way i will learn bash soon :-D – None – 2017-05-13T06:45:08.947

This performs wildcard expansion on htt.* so assumes no files matching it exist in the current directory. For most regexes, I'd agree on this site that the possibility of a file being matched are small enough that it's okay, but in this case, I don't think so, myself. The Linux kernel source code includes files named htt.c and htt.h, for instance. Changing it to grep -oehtt.* does not increase the byte count, but does make it significantly less likely to cause problems. – hvd – 2017-05-13T08:30:16.783

@hvd I usually assume that the program is run in an otherwise empty directory, but -oehtt.* is a nice way to make it more reliable. – Dennis – 2017-05-13T16:12:46.200

2

PHP, 36 Bytes

Input with https://

<?=substr(get_headers($argn)[7],10);

get_headers

25 Bytes if Location: at the beginning must not removed

<?=get_headers($argn)[7];

if Google changes the HTTP Header here is a safer version

preg_match("#Location: \K.*#",join("\n",get_headers($argn)),$t);echo$t[0];

Jörg Hülsermann

Posted 2017-05-13T04:13:48.713

Reputation: 13 026

1

Python 2, 43 bytes

Has no dependencies and is currently shorter than the other Python answer. shrug Input must match https?://goo\.gl/.*?/?

lambda s:urlopen(s).url
from urllib import*

totallyhuman

Posted 2017-05-13T04:13:48.713

Reputation: 15 378

0

NodeJS, 60 bytes

u=>require("http").get(u,r=>console.log(r.headers.location))

Input is in the format http://goo.gl/<id>.

Justin Mariner

Posted 2017-05-13T04:13:48.713

Reputation: 4 746