Mirroring a wordpress website with wget

0

I'm trying to download a wordpress web-site, my blog actually, and to get the php files as well. So far I've tried -

wget -rkp -l3 -np -nH --cut-dirs=1 http://www.sharons.org.uk/
wget -r http://www.sharons.org.uk
wget --user-agent=Mozilla --content-disposition --mirror --convert-links - A php -E -K -p http://www.sharons.org.uk/

but I can't get past the first index.html page.

How can I do it please?

boudiccas

Posted 2014-02-26T13:31:43.433

Reputation: 131

Answers

2

Short answer : You can't, that's how the internet works.

Long answer :

Two factors make what you want impossible, and that's by design.

1) PHP files aren't provided to client, they are evaluated server-side to produce HTML documents that are then sent to the client. That allow the developper to keep the source code of his website to himself, which increases security. (Even though Wordpress, here, is open-source)

2) Most of the website content is stored in a database, which is no more available to you than the php files(if it is, that's a severe security flaw), as it is also the server who queries it to produce the HTML result.

All you can do is get a static version of the website. WinHTTPTracker for Windows, for example, allows you to do that. There must be equivalent tools for linux.

mveroone

Posted 2014-02-26T13:31:43.433

Reputation: 1 752

2

It's a general fault thinking that a PHP file can be grabbed with WGET. When you run

wget -rkp -l3 -np -nH --cut-dirs=1 http://www.sharons.org.uk/
wget -r http://www.sharons.org.uk
wget --user-agent=Mozilla --content-disposition --mirror --convert-links - A php -E -K -p http://www.sharons.org.uk/

or anything like that, on the server side a lot of things happen:

  • The web server notifies the REQUEST from you / wget
  • The web server then executes php against index.php or any other requested
  • PHP querries MySQL as instructed from wordpress php files
  • PHP then returns to the web server HTML only data
  • This data is returned to the user as what you see as the home page.

The correct approach to your problem is

  • SSH into your server, or login to the administration interface (cPanel, WHM, etc.)
  • Archive or grab the whole public_html or the root directory of your site
  • Connect to your MySQL Server and backup Wordpress database by myqsladmin or phpMyAdmin

P.S: if this is your own blog, as you state, credentials/logins should not be a problem

P.S2: as i suspect, you are trying to mirror someone else site without explicit permission, and this is out of superuser.com scope

Sorry if i misunderstood

Sir.pOpE

Posted 2014-02-26T13:31:43.433

Reputation: 233

Sorry, it is my website and blog, and I rsync to it, but I'm just trying to learn how to get the php files as well, and get past the just one index.html. – boudiccas – 2014-02-26T13:46:43.723

Ok, i understand, as i try to explain, .php files are never sent to user in RAW form, they are processed by the PHP Hypertext Processor itself. The output is then rerouted to the user. Using wget you behave as an ordinary site user. – Sir.pOpE – 2014-02-26T13:48:36.967

1

Just done similar on my ubuntu server .. but you can see if my steps can help you with your issue.. ok, lets'go.

I have standard LAMP on my server and I had to mirror site to godaddy,, and easiest way was with wget,, I did it like this:

  • killed my apache service => /etc/init.d/apache2/stop
  • possition my self to root folder of my website => cd /var/www/webroot
  • run local python server on http port => python -m SimpleHTTPServer 80
  • on my godaddy server ssh pulled whole site => wget -m http://web-site.com

this flag -m is for mirror,, perfect mirror.. and it works :)

Do not forget to change your wp-config.php password if someone in the meantime pulled your site also down with connection pars :)

that's it :)

hth, krex

Kresimir Pendic

Posted 2014-02-26T13:31:43.433

Reputation: 111

0

I used a wget command to download a mirror of a local takeaway food store who I am creating a mock up for, it is Wordpress and I got the whole site including all the pages and detailed menu items viewable locally with working links using the following

wget \
      --recursive \
      --no-clobber \
      --page-requisites \
      --html-extension \
      --convert-links \
      --restrict-file-names=windows $url-of-site

I have the whole site of some 200 pages in html readable format so it does seem to be doable.

minimallinux

Posted 2014-02-26T13:31:43.433

Reputation: 1

My wget says Both --no-clobber and --convert-links were specified, only --convert-links will be used. I guess the command is not optimal then. – Kamil Maciorowski – 2017-10-27T21:00:21.793

Still got the whole site with wget only using --convert-links ? – minimallinux – 2017-10-31T11:39:58.510

0

You say in the above comment that you rsync to it. Then use rsync to download the site. It's the best method I know. Make sure you don't miss the .htaccess files! If your download command uses /* for the source, it will probably miss the hidden files, so use / only.

Other methods: filezilla, or any other ftp program.

SPRBRN

Posted 2014-02-26T13:31:43.433

Reputation: 5 185