How to backup blog running on posterous.com

4

1

I'd like to backup content of my blog which is powered by posterous.com. I'd like to save all texts and images to the local disk. Ability to browse it offline is a plus.

What I've already tried:

wget

wget -mk http://myblogurl

It downloads the first page with list of posts, then stops with "20 redirections exceeded" message.

WinHttpTrack

It downloads the first page with redirection to the www.posterous.com home page instead of real page content.

Edit: The url of the site I'm trying to backup is blog.safabyte.net

Martin Vobr

Posted 2010-01-22T18:25:46.147

Reputation: 366

I tried on a random user on posterous, and it worked without any problems. How about giving us the actual site url? – gorilla – 2010-01-23T00:45:48.970

Link added. See bottom of the question. – Martin Vobr – 2010-01-23T01:21:03.287

Just tried, wget picked up the your full blog contents – Sathyajith Bhat – 2010-01-23T05:33:38.013

Could you post the command line? In my case the 'wget -mk http://blog.safabyte.com' get index.html only. No images are downloaded. No pages with posts are downloaded. I'm using wget 1.11.3 from cygwin running on WinXP.

– Martin Vobr – 2010-01-23T10:37:37.883

@Martin Vobr : wget -mk http://blog.safabyte.net GNU Wget 1.11.1 on openSUSE 11.0 – Sathyajith Bhat – 2010-01-23T17:03:05.593

Added a 'windows' tag as it seems to be os specific. After trying few things I've found a solution. It looks like the wget -mk http://blog.safabyte.net does not works on win. However wget -mk http://blog.safabyte.net/* DOES work. – Martin Vobr – 2010-01-23T18:50:43.163

Thanks @Sathya and @gorilla. Yours proof that it works for others has made me to try to fiddle with parameters again and to find how to get it work. – Martin Vobr – 2010-01-23T18:52:20.710

@Martin : Glad to hear it worked out. You might want to post your comment as an answer and mark it as accepted, it would help others in the future. – Sathyajith Bhat – 2010-01-24T06:03:09.373

Answers

1

Posterous.com does maintain an API that might help you. In particular, their http://posterous.com/api/reading API might be of use. You may use it to obtain an XML file containing all of your posts and their content.

For example, http://posterous.com/api/readposts?hostname=jasonpearce retrieves all 12 posts that I've made to Posterous.

Jason Pearce

Posted 2010-01-22T18:25:46.147

Reputation: 206

1

This worked for me:

wget -r -l inf -k -E -p -nc http://blog.safabyte.net/

It seems like using -m turns on -N (timestamping) and posterous is not sending last modified headers which upset wget, so instead I just used -r -l inf directly.

The options used are:

-r recursive
-l inf infinite depth
-k suffix html files with .html
-E update the saved files with links to local files
-p download page resources
-nc don't redownload urls more than once

This command is still not downloading resources from other domains, which means it doesn't fetch the images as they're hosted on a different CDN.

Paul Sowden

Posted 2010-01-22T18:25:46.147

Reputation: 11

0

Managed to download at least all html content. Following code seems to download all pages from the blog (using Wget 1.11.3 on Windows XP):

wget -mk http://blog.safabyte.net/*

Posts images are still not downloaded. It looks like it's probably because they are stored on the different domains.

Html content is on blog.safabyte.com/* while images are in http://posterous.com/getfile/files.posterous.com/cheated-by-safabyte/* and files.posterous.com

Martin Vobr

Posted 2010-01-22T18:25:46.147

Reputation: 366