How do you find a size of a web-site?

2

Suppose I want to download some wiki site. It looks like it has 2000 articles only - but with wiki technology - they keep several versions of each article... So it still could be terabytes!

So my question is - how can you find the size of a given web-site - WITHOUT DOWNLOADING THE SITE?

Adobe

Posted 2011-08-06T04:25:11.740

Reputation: 1 883

Question was closed 2011-08-06T10:35:13.857

You can't. It's like asking "How long is a rope?". And the answer is "It depends.". – TFM – 2011-08-06T07:14:17.223

BTW: You could make a guess: "estimated bytes per page" x "number of pages" x "estimated number of revisions per page". But what about the pictures? – TFM – 2011-08-06T07:19:07.233

It totally depends on the website in question, therefore much too broad. You'll only know the size if you fully (recursively) download every page there is and then look at the size. By the way: Some Wikis allow content to be downloaded in one huge dump. – slhck – 2011-08-06T10:37:17.393

I'm not asking "how long is a rope" - I'm asking, in a way, "how to find the given rope's length". – Adobe – 2011-08-06T11:13:23.603

Answers

1

Most wikis store their data in a database. These are not simple pages that you can download off the web server, they are dynamically created at the time you request them, using a number of queries to that database.

Finding out the size would be tricky.... You would need the total size of the database, plus any supporting files in the web accessible directory.

I suppose if you wanted to download all 2000 articles as they stand today, you could write a script that would query the database for each article, and download it to your machine. But to get to the revisions of each article, and to access the possibly deleted articles, you would need to understand the URL scheme of the wiki software in question. Then you could measure the size of all of those files.... But that may not give you an accurate idea of the size when it is all stored on the web and database servers.

TheWellington

Posted 2011-08-06T04:25:11.740

Reputation: 176

I know Python and Perl to some extent... But I can't even imaging a script that would query the wiki database for each article - and download it. I'm currently trying to download scholarpedia.com with wget - and it seems to be downloading the database instead of articles. Can You give me a clue about the script You've mentioned? – Adobe – 2011-08-06T07:24:59.057

I've made a question at stackoverflow for such a script.

– Adobe – 2011-08-06T07:49:04.857

Sorry... I should check back more often... No.. I can't give you an example of a script... It was a very hypothetical "I suppose..." As in "It's probably not a good idea..." – TheWellington – 2011-09-01T22:12:40.483