1

I'm hosting a site for a volunteer organization. I've moved the site to WordPress, but it wasn't always that way. I suspect at one point it was hacked badly.

My Apache error log file has grown to 122 kB in just the past 18 hours. The large majority of the errors logged are of this form -- it's repeated hundreds of times today alone in my log files:

[Mon Nov 12 18:29:27 2012] [error] [client xx.xxx.xx.xxx] File does not exist: /home/*******/public_html/*******.org/calendar.php
[Mon Nov 12 18:29:27 2012] [error] [client xx.xxx.xx.xxx] File does not exist: /home/*******/public_html/*******.org/404.shtml

(I verified that xx.xxx.xx.xxx was a Google server.)

I suspect there was a security hole somewhere before, likely in calendar.php, that was exploited.

The files don't exist anymore, but there may be many backlinks that exist that reference here, hence why googlebot is so interested in crawling them.

How do I fix this gracefully? I still would like Google to index the site. I just want to tell it somehow not to look for these files anymore.

John
  • 157
  • 5
  • 1
    You can edit the file `robots.txt` to exclude these files specifically -- Google seems to respect this file fairly. Else I'd just ignore it and let Google do its thing. [Google Tools for Webmasters](http://www.google.com/webmasters/tools) allows even more fine-grained configuration options. Any reason, save for the log file size, this is a concern for you? – jscott Nov 13 '12 at 02:56
  • 2
    While technically on topic here, this question may be a better fit at our sister site [Pro Webmasters](http://webmasters.stackexchange.com/). In fact, I'm pretty sure it's been answered [once or twice](http://webmasters.stackexchange.com/search?q=%2B410+%2Bgone) there, too. – Michael Hampton Nov 13 '12 at 02:57
  • @jscott Sorry I didn't mention this but it's been doing this for a year or more. These errors dominate my error log and it looks like Google hasn't done its thing yet. Would like to give it a nudge. :) – John Nov 13 '12 at 03:15

1 Answers1

2

This is one thing that the 410 Gone error can be used for.

Google and other search engines can use this information to determine that a URL is no longer valid and is expected to never be valid again, and thus remove it from their indexes.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940