11
4
I'm trying to download two sites for inclusion on a CD:
http://boinc.berkeley.edu/trac/wiki
http://www.boinc-wiki.info
The problem I'm having is that these are both wikis. So when downloading with e.g.:
wget -r -k -np -nv -R jpg,jpeg,gif,png,tif http://www.boinc-wiki.info/
I do get a lot of files because it also follows links like ...?action=edit ...?action=diff&version=...
Does somebody know a way to get around this?
I just want the current pages, without images, and without diffs etc.
P.S.:
wget -r -k -np -nv -l 1 -R jpg,jpeg,png,gif,tif,pdf,ppt http://boinc.berkeley.edu/trac/wiki/TitleIndex
This worked for berkeley but boinc-wiki.info is still giving me trouble :/
P.P.S:
I got what appears to be the most relevant pages with:
wget -r -k -nv -l 2 -R jpg,jpeg,png,gif,tif,pdf,ppt http://www.boinc-wiki.info
No need to cross post between superuser and serverfault http://serverfault.com/questions/156045/how-to-download-with-wget-without-following-links-with-parameters
– Bryan – 2010-06-29T22:07:23.870Where should I have posted it? – Tie-fighter – 2010-06-29T22:20:19.690
this is the right place. It's not a server question. – David Z – 2010-06-30T00:42:04.400
Still I got the better answers at serverfault ;) – Tie-fighter – 2010-06-30T00:56:56.853