Spider/crawl a website and get each URL and page title in a CSV file

I am moving from an old ASP shopping cart site to a Drupal/Ubercart site. Part of this move is to ensure that old links will redirect to the new ones. To do that all I need is some way to get a list of all the links from the old site.

Preferably the results would have the page title and ideally I could give it some way to return other data from the page (ex. a CSS selector).

I would prefer if it were in OS X, but I can use Windows apps too.

I have tried Integrity, but it's output is nearly impossible to decipher, plus it doesn't seem to work well.

csv
drupal
web-crawler

Tyler Clendenin

Posted 2012-08-02T05:54:31.787

Reputation: 251

R, can handle this. But I'm not sure how to do it for an entire website. Here's an example of parsing one page: http://stackoverflow.com/questions/3746256/extract-links-from-webpage-using-r

– Brandon Bertelsen – 2012-08-02T06:44:05.027

Spider/crawl a website and get each URL and page title in a CSV file

Answers