Get all links of a website

0

Is there a way/tool to get all links of a website ? Just the links , not looking to create a local copy/download a website . Example - Links of all questions posted on Superuser . Platform Windows 7 , Ubuntu 14.04

Renuka

Posted 2014-10-26T12:27:30.557

Reputation: 151

http://www.iwebtool.com/link_extractor - similar things? – hagubear – 2014-10-26T12:33:40.940

I tried it . It only gives links of a single page . Not the whole website . – Renuka – 2014-10-26T12:35:43.347

Read about this then :) @Renuka

– hagubear – 2014-10-26T12:36:57.287

possible duplicate of How can I download an entire website?

– Ƭᴇcʜιᴇ007 – 2014-10-26T16:37:33.123

1I realize you said you want 'links only' and not the whole site (ie: the one I marked as a possible dupe), but the #1 answer (HTTrack) for the duplicate has the ability to do what you ask, and I would guess at least one of the other suggestions would do it as well. If you have tried them, please let us know why they didn't work for you in your case. – Ƭᴇcʜιᴇ007 – 2014-10-26T16:40:32.240

Funnily enough I'm building an email-extractor, but it can already grab all links from a website. If you give me a few hours I'll make some amendments (so it displays the links) and post you the link to my repo on github and you can grab it from there – benscabbia – 2014-10-26T16:57:16.313

@Ƭᴇcʜιᴇ007 . I installed HTTrack but couldn't find the "link only" option . Can you tell me where is it ? – Renuka – 2014-10-26T18:29:37.383

@hagubear .Tried it but it didn't work fully . It didnt extract all the links . Also my PC/IP is banned from the sites on which I tried the software :( . – Renuka – 2014-10-26T18:30:59.820

http://forum.httrack.com/readmsg/24984/24983/index.html – Ƭᴇcʜιᴇ007 – 2014-10-26T18:32:34.513

@gudthing No problemo :) – Renuka – 2014-10-26T18:41:45.753

@Ƭᴇcʜιᴇ007 . It started downloading all the files . And hts-cache folder is empty for now . The guy doesn't say about any setting to make to not download/just crawl the site . – Renuka – 2014-10-26T18:57:34.423

Answers

1

Sorry for keeping you waiting. I have uploaded my program here.

The program is still in very-very early phase, so most features do not work, but it does, however, grab all links to other pages on the website.

It needs java to run and you should be able to double click the file and a UI should load up. Type in the SearchW box (in the GUI) the website address i.e. http://google.com, http://bbc.co.uk

Then you can copy and paste all the links as they are printed (I still need to implement an export feature but you'll be able to copy the links for the moment)

Let me know if you have any issues! And if you like it, I will, (once it's in a decent state) post a link to my repo where you'll be able to download the newer versions.

benscabbia

Posted 2014-10-26T12:27:30.557

Reputation: 370

Hello . I tried it first on filehippo.com . it gave me this output which looks ok . The site isnt that big. http://pastebin.com/CfSe4RgD . Then I tried it on a bigger site . 9gag.com . It gave a output of only 61 lines http://pastebin.com/VE30u8DE. Filehippo was 208 lines . its impossible . 9gag has millions of posts .

– Renuka – 2014-10-27T13:02:24.297

BTW Thank You . Keep updating if you can. I will keep testing it for you . :) – Renuka – 2014-10-27T13:12:16.337

1@Renuka sorry I should have mentioned. The Parser only scan one level i.e. Grabs all links from the home page and then it checks each page for an email (I know it's not of interest to you). But if you want it to grab literally every single link on a website I will have to change a bit of code. The only problem is that if there is a link (say on homepage) which point out to another website, it will also start grabbing links from other websites (it doesn't know if the link is from the current website or a different website. But I'll see what I can do – benscabbia – 2014-10-27T13:13:57.410

1@Renuka you're most welcome :). I won't be able to work on it for a few days but now I have someone who is waiting for updated versions I will do my best to keep developing it asap! – benscabbia – 2014-10-27T13:15:26.850

No problem :) .BTW I just double clicked the jar file and it gave me a UI . Didn't go to cmd to type that command you asked to . – Renuka – 2014-10-27T13:20:55.643

1@Renuka I was hoping it would run directly (they weren't actually instructions but I have re-worded now). Can be a real pain to get it to run through console (would probably require system variables configuration etc) – benscabbia – 2014-10-27T13:25:37.153