How to analyze a link to figure out the actual link



Sometimes when downloading something, I find the links are not the direct ones to the files.

For example, this is a link to download a PDF file:,ishare&IP=1312761745,68.50.222.

I was wondering how to figure/hack out the actual link (I.e. http://*.PDF) to the file?

What are the names for such and similar techniques of not showing direct links? Some references, such as Wikipedia,...?


Posted 2011-08-07T23:56:22.417

Reputation: 12 647

1Fixed your link. Click edit to see the markdown source for how I did it. – Joel Coehoorn – 2011-08-08T03:11:49.897



Yes, sometimes.

There are two things that commonly happen. Your link doesn't work anymore, so I am not sure the actual scenario in this case, so I will summarize on another link.

HTTP Redirection

This is what you see with and other services. What then do is provide an HTTP redirect response. When you visit it redirects to the actual URL. Sometimes one URL redirects to another. You can see this happening if you plug the URL into or by using [curl][1] -I you will see returns a 301 pointing to a new Location.

So to deal with HTTP Redirection you just need to loop an HTTP HEAD request until you stop getting responses in the 300's (hopefully getting a 200). Keep in mind it is possible that they will redirect in a loop, which will never end. You can do this with CURL or any HTTP tool.

Downloader Page

This is what most download sites use. You click the download link and it takes you to a page with a bunch of ads and says "Your download will begin shortly" something similarly. [Example]. With these you can try to parse the actual direct link from the URL, but that would be site specific, and most sites will not include it to prevent you from circumventing it. This is done either via a meta http-equiv="refresh" tag in the header, or JavaScript (most common). The JS usually has a header fallback though.

There is a solution though. If you look at the source on download page you will usually see a <meta http-equiv="refresh"> tag (usually in a <noscript> tag) with an attribute of URL that points to the actual download. So use CURL (or any other HTTP tool) to download the page, parse it out, and grab that value. A site may exclude this though if they want to be really nasty, thus requiring you to have JavaScript to download files.

There is probably a JavaScript block that links to the download as well. It may be obfuscated, or linked from another URL. Your mileage may vary trying to parse that out. There may also be a "direct link" on the page. You could try a few techniques to find that, but again that could be obfuscated via JavaScript or even missing all together.

Jim McKeeth

Posted 2011-08-07T23:56:22.417

Reputation: 4 907


It might not be possible. The sites could feed you through a hundred redirects before you get to the file.

In addition, javascript can be used to give out links based on the URL that was given to the server.


Posted 2011-08-07T23:56:22.417

Reputation: 22 744

Thanks! What are the names for such and similar techniques of not showing direct links? Some references, such as wikipedia,...? – Tim – 2011-08-08T00:04:19.960


Just redirection. don't know of any other techniques. See here

– soandos – 2011-08-08T00:05:37.087

When the links are to files and such as opposed to regular pages, it’s usually called anti-leech. – Synetech – 2011-08-08T03:59:33.283


The site could be scripted, and when it gets a certain command (the URL can pass a command), it might then return a PDF file (or some other file), without redirecting. There it's a server-side thing and depends on how the site is coded. Without actually requesting that link from the server, it's unlikely you could figure out how to get the file. And sometimes even if you knew the direct URL, you might not have permission to access the direct link. Some sites are coded so that direct links won't work.

Ben Richards

Posted 2011-08-07T23:56:22.417

Reputation: 11 662

1This is correct. Small addition: when a client is accessing an obscure link like that and the server wants to tell the client that this is actually a pdf file and it should be saved as pdf file - the server tells it to the client via the Content-Type HTTP header. – vtest – 2011-08-08T09:59:39.953


This is pretty much the "true" URL as for those well-protected websites, you have to submit the complete URL for the server to authenticate your request. You may be directed to another URL afterwards, but it will normally be a one-time one. In other words, these file download websites will never give you a leech-able direct link.

In this particular URL, the parameters, which are protected by a digital signature, clearly list time and IP restrictions of the downloader. For a website with this level of competence, it is unlikely there will be leaked direct links.

Posted 2011-08-07T23:56:22.417

Reputation: 6 821


These redirect links are also often associated with session state. They'll do some privs checking, based on your session login, before providing you with the link - no access to the resource = no link access to the resource. It can be used to provide access to files/resources that are outside the web root, and streamed back to the requestor by the app, rather than being available via direct URL. But only if your privs allow for it.

An example of both is here. It will re-direct you to another URL, based on the 'mkoenig' string that adds additional URL parms. The re-direct is done in server-side coding, which you wouldn't (or at least shouldn't) be able to see. If you then go to 'Web files', the files listed are only the files the teacher has made public. She may have other files out there that you can't get to, and which won't be listed. That's also handled on the server side coding, with regards to what it will/won't return.

Without hacking the server or having access to the server side source code, I don't believe you can get the actual link, and even if you can, it may not be helpful unless your session privs tell the server to give you access to it.



Posted 2011-08-07T23:56:22.417

Reputation: 31


Like above, it is impossible. I recommend you fire up a VM or grab a copy of Sandboxie to contain your browser for links like such.


Not knowing which operating system you are using, I will give a general answer here.

A VM is short for a virtual machine. It's basically a software created computer running inside your computer. A virtual machine will have its own operating system and browser. While it still uses your machine's internet and harddive, it is a separate machine from the OS's point of view. So that makes it convenient like a scratch piece of paper. Anything that happens inside the VM does not affect your real machine.

Virtualbox is the software I recommend you get for a VM, unless you are using Windows, then I'd recommend Virtual PC.

Sandboxie is sort of like a VM, but it just isolates specific applications. You can tell it to run a copy of your browser and any files or actions the browser does is redirected to a temporary folder essentially. So if it tries to download a virus, it gets redirected to a temporary folder that is quarantined from the rest of your machine. It's not as robust as a VM, but it uses less CPU/Memory and is faster and more convenient.

I would try Sandboxie first. It's a smaller learning curve.


Posted 2011-08-07T23:56:22.417

Reputation: 21 453

1Thanks! What do you mean by "fire up a VM or grab a copy of Sandboxie to contain your browser for links like such"? Some references to explain how to do the two things? – Tim – 2011-08-08T00:06:03.610

Sorry, I'll expand my answer. – surfasb – 2011-08-08T01:02:20.733

Thanks! My OS is Ubuntu. But I don't get how using VM can identify the direct links? – Tim – 2011-08-08T01:34:21.427

1It doesn't really identify the link for you. But it creates a safe environment that will help you identify the link. You can't identify the redirects without compiling the page. – surfasb – 2011-08-08T01:45:28.637

Thanks! I wonder what you mean by a safe environment? How can it help you identify the link? – Tim – 2011-08-08T03:15:41.483

Safe environment means that if you get re-directed & "infected", it's the VM that takes the hit, not your primary OS. – Joe Internet – 2011-08-08T03:46:19.570

1I don't think his question is concerning security. He's just looking for a way to get the direct file link. – magnattic – 2011-08-08T10:54:46.457

In this day and age, I can't see how you can't address security concerning a) a obscured link and b) a PDF. Two of the most common attack vectors. It's akin to someone asking about web hosting and only limiting the answers to dedicated hosts. It's the 21st century. – surfasb – 2011-08-08T11:16:50.340


It's never possible to figure out the actual link.

Server handles the file requests, using some WWW Rewrite (for apache servers, for example), so for example, you could be going to a page, but in reality you could be accessing some php file, with a parameter, such as:

even if you access a pdf file, there might be a redirect in the server side itself.

For file downloads with handlers, it might be a tad more trickier, as you can go to a page which is just a download handler, for example:, etc... In this case, the script you're accessing to might be sent with one header (like pdf file), but in all actuality it's a php file.

in conclusion: you can never know how the server & the scripts are configured, so you can never know the real, actual adress, even if it seems like you know.

Itai Sagi

Posted 2011-08-07T23:56:22.417

Reputation: 121


When you send a request to a web server (click on a link), the server can send a number of different responses. Common examples are 404 (page not found), 403 (forbidden), or 500 (server error). Probably the most common response code is 200 (Ok), but you'll never see that one because it's generally accompanied by the page you were hoping to see.

There are a couple other codes at play here: 301 and 302. Codes 301 and 302 are redirect codes, and they tell your browser that the response you wanted has moved to another location. The main difference between them is how the browser caches things. A 301 code means "moved permanently", and the next time you try to visit the original link the browser may remember that the page has moved and go directly to the new location. 302 means "Found elsewhere" and will provide a link that your browser should use only temporarily.

It should be possible to write a program that will check a link, and as long as you keep getting 30x responses follow the response, until it finally gets a 200. At this point, rather than downloading the content it should show you the link.

Unfortunately, it's also more complicated than that. An Html page can also redirect to a new location using a meta tag in the page's head section that looks something like this: <meta http-equiv="refresh" content="0;url=NEW PAGE URL" />. So such a program would already have to completely parse html to be sure we reach the last redirect.

Additionally, a page could redirect you further using javascript, and the javascript might be obfuscated. So now our hypothetical program also has to understand javascript. At this point we have a fully functional web browser. We're missing the small little detail of actually rendering a page to the screen, but our program has nearly everything else you need for a complete web browser, including all the accompanying security issues. You're no longer any better off than if you'd just clicked the link normally in the first place.

Joel Coehoorn

Posted 2011-08-07T23:56:22.417

Reputation: 26 787


Assuming Windows: Install Fiddler Web Debugging Proxy, enable it. Then navigate to your starting URL and watch all the redirects in Fiddler's left pane. On the right pane change tabs to show "Request headers" and "Response headers". I have successfully used it for exactly that purpose.

On the other hand the "final" URL alone may not work the same way if you browse to it directly because the request may not have the right referrer or miss some other restrictions.

But you can even send custom requests with custom headers in Fiddler. See tab "Request Builder" for that.


Posted 2011-08-07T23:56:22.417

Reputation: 1 075


(Meanwhile) there are some great online tools to help with tracing redirects.

And there is a nice extension for Google Chrome.


Posted 2011-08-07T23:56:22.417

Reputation: 1 075


This is largely site dependent.

To do this, you have to study each site individually and have a separate mechanism (or code) for each site to return the direct URI.

You can also study some open-source browser add-ons which provide similar functionality.


Posted 2011-08-07T23:56:22.417

Reputation: 103


To check where a link redirects you to you can use It is especially useful for shortened URLs. It is of no use for scripted downloads etc.

Omar Kohl

Posted 2011-08-07T23:56:22.417

Reputation: 101


If you're using Firefox, you can use an addon called RequestPolicy which, among other things, will pause and ask you for permission whenever you are redirected onto a different domain. It won't work if you want to find a redirect that doesn't go to a different domain, but I'm sure there's a different Firefox Addon for that which I don't know of =)

Note that it will break a lot of sites that use a CDN (Content Distribution Network) since by default it blocks all cross-domain images, scripts, css, and redirects. So its not the ebst choice if you want to always know where a redirect is going, unless you're prepared to have to go through a few extra steps every time you visit a new website.

William Lawn Stewart

Posted 2011-08-07T23:56:22.417

Reputation: 1 889


I'm not really sure, but if you are using CUrl, can you not just obtain the URL contents (get_file_contents(url) in PHP) and then check the MIME type?


Posted 2011-08-07T23:56:22.417

Reputation: 101