save web page with all the related content

weirdly enough the answer was deleted somehow.

here is the answer:

wget -r -l2 http://docs.oasis-open.org/ubl/os-UBL-2.0/xsd

wget -r -np http://docs.oasis-open.org/ubl/os-UBL-2.0/xsd

see https://www.gnu.org/software/wget/manual/html_node/Directory_002dBased-Limits.html

‘-np’ ‘--no-parent’ ‘no_parent = on’
The simplest, and often very useful way of limiting directories is disallowing retrieval of the links that refer to the hierarchy above
than the beginning directory, i.e. disallowing ascent to the parent directory/directories.
The ‘--no-parent’ option (short ‘-np’) is useful in this case. Using it guarantees that you will never leave the existing hierarchy.
Supposing you issue Wget with:
wget -r --no-parent http://somehost/~luzer/my-archive/

You may rest assured that none of the references to /~his-girls-homepage/ or /~luzer/all-my-mpegs/ will be followed. Only
the archive you are interested in will be downloaded. Essentially, ‘--no-parent’ is similar to ‘-I/~luzer/my-archive’, only it handles redirections in a more intelligent fashion.
Note that, for HTTP (and HTTPS), the trailing slash is very important to ‘--no-parent’. HTTP has no concept of a “directory”—Wget
relies on you to indicate what’s a directory and what isn’t. In ‘http://foo/bar/’, Wget will consider ‘bar’ to be a directory, while in ‘http://foo/bar’ (no trailing slash), ‘bar’ will be considered a filename (so ‘--no-parent’ would be meaningless, as its parent is ‘/’).

maazza

Posted 2015-11-30T09:36:20.197

Reputation: 336

save web page with all the related content

Answers