save web page with all the related content

1

1

I am trying to figure how I can save a webpage with all the related files, for example : http://docs.oasis-open.org/ubl/os-UBL-2.0/xsd/

I want to save all the files in the directory kinda like a crawler but more limited and if possible in firefox

maazza

Posted 2015-11-30T09:36:20.197

Reputation: 336

Answers

0

weirdly enough the answer was deleted somehow.

here is the answer:

wget -r -l2 http://docs.oasis-open.org/ubl/os-UBL-2.0/xsd

or

wget -r -np http://docs.oasis-open.org/ubl/os-UBL-2.0/xsd

see https://www.gnu.org/software/wget/manual/html_node/Directory_002dBased-Limits.html

‘-np’ ‘--no-parent’ ‘no_parent = on’

The simplest, and often very useful way of limiting directories is disallowing retrieval of the links that refer to the hierarchy above

than the beginning directory, i.e. disallowing ascent to the parent directory/directories.

The ‘--no-parent’ option (short ‘-np’) is useful in this case. Using it guarantees that you will never leave the existing hierarchy.

Supposing you issue Wget with:

wget -r --no-parent http://somehost/~luzer/my-archive/

You may rest assured that none of the references to /~his-girls-homepage/ or /~luzer/all-my-mpegs/ will be followed. Only

the archive you are interested in will be downloaded. Essentially, ‘--no-parent’ is similar to ‘-I/~luzer/my-archive’, only it handles redirections in a more intelligent fashion.

Note that, for HTTP (and HTTPS), the trailing slash is very important to ‘--no-parent’. HTTP has no concept of a “directory”—Wget

relies on you to indicate what’s a directory and what isn’t. In ‘http://foo/bar/’, Wget will consider ‘bar’ to be a directory, while in ‘http://foo/bar’ (no trailing slash), ‘bar’ will be considered a filename (so ‘--no-parent’ would be meaningless, as its parent is ‘/’).

maazza

Posted 2015-11-30T09:36:20.197

Reputation: 336