How can I download a phpbb forum with wget including password protected sections?

4

1

I want to make a download of a forum I moderate, before it closes for good. There's some useful info on it I want to save for myself and I don't want to export the data to another webserver, I just want the pages. Mind you, I'm a user at the forum, not the admin. Now, I googled this and found it can be easily done with wget: How can I download an entire (active) phpbb forum?

I used:

wget -k -m -E -p -np -R viewtopic.php*p=*,memberlist.php*,faq.php*,posting.php*,search.php*,ucp.php*,viewonline.php*,*sid*,*view=print*,*start=0* -o log.txt http://www.example.com/forum/

I experimented with this, but I can only achieve downloading the publicly visible sections, not the sections you have to log in for. I tried to achieve this by using a Firefox plugin to make a cookies.txt (while my session is logged into the forum) and add --load-cookies file cookies.txt to the command, but still I only get the publicly visible sections.

Any suggestions to make this work?

Rocky84

Posted 2010-12-29T10:50:23.200

Reputation: 41

Do you know any Python? – paradroid – 2010-12-30T13:48:41.540

1One thing I assume is happening is that wget follows all the links on the pages it finds. If you start at the index page of a forum, I assume it goes through the code top to bottom. One of the links on the index page is the 'logout' link. Whenever it hits that, it automatically logs out the session. I've tried adding and exclude for login.php* or something like that, but so far I keep getting the same results.

Anyone know how I can tell wget to exclude any page that starts with http://www.example.com/forum/login.php*? so that it doesn't log itself out while it is busy? – None – 2011-02-04T07:17:13.327

Answers

1

You may need to set up cookies for the session because many web sites use cookies to make the login and logout functions work.

The "--load-cookies" option might help you here.

Randolf Richardson

Posted 2010-12-29T10:50:23.200

Reputation: 14 002

0

see my answer here: https://superuser.com/a/1371654/216033

First need to get the SID and use it in next request.

Example with login:

PHPBB_URL=http://www.someserver.com/phpbb
USER=MyUser
PASS=MyPass

wget --save-cookies=./session-cookies-$USER $PHPBB_URL/ucp.php?mode=login -O - 1> /dev/null 2> /dev/null

SID=`cat ./session-cookies-$USER | grep _sid | cut -d$'\011' -f7`

echo "Login $USER --> $PHPBB_URL SID=$SID"

wget --save-cookies=./session-cookies-$USER \
 --post-data="username=$USER&password=$PASS&redirect=index.php&sid=$SID&login=Login" \
 $PHPBB_URL/ucp.php?mode=login --referer="$PHPBB_URL/ucp.php?mode=login" \
 -O - 1> /dev/null 2> /dev/null

wget --load-cookies ./session-cookies-$USER -k -m -E -p -np -R memberlist.php*,faq.php*,viewtopic.php*p=*,posting.php*,search.php*,ucp.php*,viewonline.php*,*sid*,*view=print*,*start=0* $PHPBB_URL/viewtopic.php?t=27704

######## loop thru topics see below(but above should get most with the options. 
#wget --load-cookies ./session-cookies-$USER -k -m -E -p -np -R $PHPBB_URL/viewtopic.php?t={1..29700}

Tilo

Posted 2010-12-29T10:50:23.200

Reputation: 181