1
I want to make a mirror of a site that has a dynamic sitemap in XML form.
Of course I want that sitemap downloaded and processed as if it were an html file.
I tried the -F
flag for this file, but it didn't work, saying that it didn't find any URLs inside the file.
Currently I assume that this won't work this way (because wget is not for xml), but wanted to ask to make sure I'm not overlooking something.
The content of the xml looks like this:
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="http://MY_SITE/wp-content/plugins/google-sitemap-generator/sitemap.xsl"?><!-- sitemap-generator-url="http://www.arnebrachhold.de" sitemap-generator-version="4.0.8" -->
<!-- generated-on="June 11, 2017 6:05 pm" -->
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap>
<loc>http://MY_SITE/sitemap-misc.xml</loc>
<lastmod>2017-05-31T20:49:06+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://MY_SITE/sitemap-pt-post-2017-04.xml</loc>
<lastmod>2017-04-12T16:27:52+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://MY_SITE/sitemap-pt-post-2017-02.xml</loc>
<lastmod>2017-02-10T17:50:14+00:00</lastmod>
</sitemap>
[...]
</sitemapindex>
And each subsitemap then like:
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="http://MY_SITE/wp-content/plugins/google-sitemap-generator/sitemap.xsl"?><!-- sitemap-generator-url="http://www.arnebrachhold.de" sitemap-generator-version="4.0.8" -->
<!-- generated-on="June 11, 2017 6:07 pm" -->
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url>
<loc>http://MY_SITE/32017-SOME_CONTENT/</loc>
<lastmod>2017-04-12T16:27:52+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://MY_SITE/32017-SOME_OTHER_CONTENT/</loc>
<lastmod>2017-04-12T16:24:25+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
No, there is no HTML display. That is a xml format so that google can index your page faster. I'll edit an example in my question. – Angelo Fuchs – 2017-06-11T18:05:06.297