23

I just updated my robots.txt file on a new site; Google Webmaster Tools reports it read my robots.txt 10 minutes before my last update.

Is there any way I can encourage Google to re-read my robots.txt as soon as possible?

UPDATE: Under Site Configuration | Crawler Access | Test robots.txt:

Home Page Access shows:

Googlebot is blocked from http://my.example.com/

FYI: The robots.txt that Google last read looks like this:

User-agent: *
Allow: /<a page>
Allow: /<a folder>
Disallow: /

Have I shot myself in the foot, or will it eventually read: http:///robots.txt (as it did the last time it read it)?

Any ideas on what I need to do?

qxotk
  • 1,434
  • 2
  • 15
  • 26
  • FYI: The site is new, and this message appears in Settings|Crawl Rate: "Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate." – qxotk Aug 18 '10 at 18:22
  • FYI: I found a posting in google groups that said google will read robots.txt "at least once a day" - can anyone confirm that? [google groups posting is here: http://groups.google.com/group/google_webmaster_help-indexing/browse_thread/thread/69e7a2770480bfdf?pli=1 ] – qxotk Aug 18 '10 at 22:08
  • FYI: 1 day has passed, and google has not yet read my updated robots.txt. – qxotk Aug 19 '10 at 15:41
  • Same issue here, this is not a "feature"... – mate64 Feb 12 '13 at 11:54

5 Answers5

25

In case anyone else runs into this problem there is a way to force google-bot to re-download the robots.txt file.

Go to Health -> Fetch as Google [1] and have it fetch /robots.txt

That will re-download the file and google will also re-parse the file.

[1] in the previous Google UI it was 'Diagnostics -> Fetch as GoogleBot'.

Tom O'Connor
  • 27,440
  • 10
  • 72
  • 148
Matt
  • 366
  • 3
  • 4
  • 11
    Unfortunately this will not work if your robots.txt is set to ```Disallow: /```. Instead the fetch reports "Denied by robots.txt" :/. – studgeek Aug 02 '12 at 19:14
  • 3
    Next time add this line. Allow: /robots.txt – jrosell Sep 16 '12 at 16:52
  • I can't find 'Diagnostics', maybe the UI has changed? – David Riccitelli Dec 17 '12 at 07:28
  • 2
    Ok, it is now Health > Fetch as Google. – David Riccitelli Dec 17 '12 at 07:36
  • Not working for me when I try to fetch robots.txt. ERROR: "The page could not be crawled at this time because it is blocked by the most recent robots.txt file Googlebot downloaded. Note that if you recently updated the robots.txt file, it may take up to two days before it's refreshed. You can find more information in the Help Center article about robots.txt." – Indrek Feb 18 '13 at 15:16
  • I had the exact same issue and this solution worked perfectly for me. Thanks @Matt – Alexander Holsgrove Sep 02 '14 at 15:45
  • Thanks! This worked for me. **Fetch as Google** is now in the menu under **Crawl**. – rsbarro Oct 03 '14 at 16:13
4

I know this is very old, but... If you uploaded the wrong robots.txt (disallowing all pages), you can try the following:

  • first correct your robots.txt to allow the correct pages, then
  • upload a sitemap.xml with your pages

as google tries to read the xml sitemap, it will check it agains robots.txt, forcing google to re-read your robots.txt.

Hussam
  • 163
  • 6
2

After have the same problem I sucessfuly made google reread my robots.txt file by submiting on this url:

https://www.google.com/webmasters/tools/robots-testing-tool

potrodoido
  • 71
  • 1
  • 1
1

OK. Here is what I did, and within a few hours, Google re-read my robots.txt files.

We have 2 sites for every 1 site we run. Let's call them the canonical site (www.mysite.com) and the bare-domain site (mysite.com).

We have our sites setup so that mysite.com always returns a 301 redirecting to the www.mysite.com.

Once I setup both sites in Google Webmaster tools, told it that the www.mysite.com is the canonical site, it soon after read the robots.txt file on the canonical site.

I don't really know why, but that's what happened.

qxotk
  • 1,434
  • 2
  • 15
  • 26
0

Shorten google scan interval for some days.

Also, I've seen there buttom to verify your robots.txt, this might force it to google, but I am not sure.

BarsMonster
  • 644
  • 3
  • 11
  • 24
  • Can you be more specific? I see: Site Configuration | Crawler Access | Test robots.txt, but that tests the text you paste in the box, not your live robots.txt file - also, this is where it tells me when it was last downloaded. Where is the "verify" button you speak of? – qxotk Aug 18 '10 at 18:24