Questions tagged [sitemap]

19 questions
11
votes
4 answers

How does Wikipedia generate its Sitemap?

The topic interests me because of Wikipedia's size. It may be easy to create some crons to update the sitemaps periodically in a small site, but what about a big one? So: How does Wikipedia generate its Sitemap?
user10608
6
votes
4 answers

How can I protected my sitemap index file and sitemap.xml files from leechers?

I have a "content" website that some leechers and 419 scammers love to crawl agressively which also generates costs and performance issue. :( I have no choice: I need to prevent them to access the sitemap files and index. :( I am doing the same as…
Toto
  • 283
  • 1
  • 4
  • 12
4
votes
1 answer

how to force Nginx to override header?

I'm trying to display my sitemaps. Browsers display my sitemap index as xml but treat post sitemaps as plain text. I tried to override content type with below configuration but it didn't help. location ~ \.xml$ { proxy_hide_header…
Hasan Tıngır
  • 153
  • 1
  • 3
  • 9
3
votes
2 answers

How many types of sitemaps are there?

I was perplexed to find two different sitemaps in Google sites: http://sites.google.com/site/(name of the site)/system/feeds/sitemap http://sites.google.com/site/(name of the site)/system/app/pages/sitemap/hierarchy Now, I am ready ask the…
user10608
2
votes
0 answers

How to use sitemap.xml to create a static mirror of a CMS

Is there a tool to create a static mirror of a content management system (CMS) that provides a sitemap.xml file? Ideally, I would point a tool like wget or curl to a sitemap.xml file and have it automatically sync the static directories using the…
Lee Joramo
  • 21
  • 1
2
votes
1 answer

How do I exclude my sitemap from httacess redirect?

I want all urls to be redirected, except my sitemap xml file in the root directory. The htaccess should allow https://old-domain/xml.xml to resolve with HTTP 200, but it is still redirecting to the new domain at the moment. How can I exclude the…
Till Noah
  • 21
  • 2
1
vote
3 answers

Can I protect my sitemap.xml so that only searchengines can download it?

I'm planning of adding a bunch of aggregated lists of pages in my sitemaps that I don't want make it too easy for outsiders to screnscrape. Can I protect my sitemap.xml so that only searchengines can download it? Install a firewall? I'm using IIS6.…
Niels Bosma
  • 243
  • 1
  • 4
  • 15
1
vote
0 answers

google sitemap generator : only the default hostname is listed

I successfuly installed the google sitemap generator on a kimsufi server, running on debian with apache2.2. But when I go to http://example.com:8181, only the default hostname is listed, so I can't configure the other hosted website. I installed…
Snyf
  • 111
  • 1
1
vote
1 answer

in sitemap after moving from apache to nginx

I have a sitemap named http://www.domain.com/sitemap1.php. It starts with this code:
1
vote
1 answer

Store sitemaps off-site

We got Nginx webserver. And sitemaps that we generate every week or so ... We migrated to multiple web-servers under single load-balancer lately, and keeping a sitemaps on every webserver seem kinda silly. As we are on AWS, is there a way to store…
Katafalkas
  • 523
  • 2
  • 8
  • 20
1
vote
1 answer

Multilanguage google sitemap

Masters, We translated our site to english and im little bit confused about sitemap.xml. Till now, we have a sitemap like this:
holian
  • 227
  • 1
  • 8
  • 14
1
vote
3 answers

Unable to generate a sitemap by Google's generator

I would like to generate a sitemap by my Uni.s account such that I have a cron which continuously run the sitemap_gen.py -file. The sitemap is for my site at Google Sites and particularly for the users of the site, not only for search engines. How…
0
votes
3 answers

Why google didn't crawl all stuff in sitemap.xml?

There are 3000 entries inside sitemap.xml,but it turns out that Google just crawls 300 of them,what's the problem?
Mask
0
votes
2 answers

Google mini ignoring sitemap

I'm in the processing of setting up a Google Mini device to index our site which has a lot of dynamically generated content. I've created a dynamic site.map file which lists all of the dynamic URL's. This is currently being indexed by Google but…
Dave Barker
  • 111
  • 2
0
votes
0 answers

Apache sitemap generator to generate more than 150K URLs & daily 2K URLs

I have a big WordPress installation more than 150K post about 2K daily posts 3 months ago i used Google Sitemap Generator but unfortunatley it is not working with CENTOS 6 It is really very good because it is using only about 2M.B of RAM &…
adnan
  • 101
  • 1
  • 4
1
2