mirror full website1 and all links to specific website2

2

0

I want to mirror a website that will be taken down soon, the problem im facing at the moment is simple -> mirror whole website1 and all links (will be files/images/similiar) to website2 too, so i got a nice "merged" mirror.

-Therefore the Question would be:

How to do this with wget? Are the other ways to solve this problem (if not possible with wget)?

-Logic Example:

The Website is http://example.org and it will get mirrored (wget -mk). wget should also mirror every content being hosted on http://foo.bar too but nothing else.

K1773R

Posted 2012-12-20T11:10:42.940

Reputation: 123

Answers

1

Something like this is what you are looking for:

wget -mk -w 20 http://www.example.com/ --exclude-domains sunsite.foo.edu --domains yahoo.com,google.com
  • m Turn on options suitable for mirroring; time-stamping & infinite recursion depth keeping directory listings.
  • k After the download is complete, convert the links in the document to make them suitable for local viewing.
  • w SECONDS Introduce a delay between accesses to the server.
  • --exclude-domains DOMAIN-LIST Domains that are not to be followed.
  • --domains DOMAIN-LIST A set domains to be followed, a comma-separated list of domains.

X.Jacobs

Posted 2012-12-20T11:10:42.940

Reputation: 145

im looking for a way to do --include-domain <LIST> not exclude specific sites. – K1773R – 2012-12-20T23:00:13.097

@K1773R Checkout my updated answer, you may include a list of domains also

– X.Jacobs – 2012-12-20T23:11:05.137

--domains was what i was looking for, ty! – K1773R – 2012-12-20T23:22:06.560

0

rysnc will mirror the files

rysnc -auvz source destination

the -u flag will skip files that are newer on the destination, so this is probably what you want.

Sam Doidge

Posted 2012-12-20T11:10:42.940

Reputation: 101

the second website has no links nor directory listings. – K1773R – 2012-12-20T11:35:51.073

Why does that matter? :) wget -m download the first site, and merge together with rysnc? – None – 2012-12-20T11:49:26.043

first site got tons of links to other sites, but i want only specific ones to be followed (to foo.bar as example) – K1773R – 2012-12-20T12:38:00.173

0

wget -p -k http://example.org

The -p will get you all the required elements to view the site correctly (css, images, etc). The -k will change all links (to include those for CSS & images) to allow you to view the page offline as it appeared online.

serk

Posted 2012-12-20T11:10:42.940

Reputation: