7

I know that wget can fetch a remote page and its dependencies and rewrite the html so that image src attributes reference the newly downloaded images.

I am trying to convert local html files that reference images on the Internet. I'm using

wget --mirror --page-requisites --convert-links \
     --directory-prefix=foo \
     --force-html \
     --input-file=my_file.html

All of the referenced images are downloaded to the appropriate places in foo/ but the src attributes in my_file.html aren't being changed.

Kevin L.
  • 203
  • 2
  • 4

1 Answers1

2

Try this:

wget --recursive --page-requisites --convert-links --span-hosts http://localhost/some.html
Casual Coder
  • 1,216
  • 1
  • 11
  • 12
  • This still requires that I run an http server on my local machine just for wget--is there no way to do this using --input-file? – Kevin L. Jul 08 '11 at 14:19
  • I am afraid not. Wget do not support `file://` scheme. `--input-file` is only to read many urls conveniently in a batch manner. – Casual Coder Jul 08 '11 at 17:13
  • Many of the images in the file have a relative `src`, so if I serve it from localhost wget 404's. Apparently the `--base` flag can only be used with `--input-file` and `--force-html`. – Kevin L. Jul 10 '11 at 04:27
  • Can you serve also the images? Using symlinks you can link to a `foo/` directory with images, so your localhost path mimics that in relative `src` paths. Or you can edit html files and place `base` tag with appropriate `href` attribute in `` section. – Casual Coder Jul 10 '11 at 05:47