My logwatch report which I installed recently shows me this:
--------------------- httpd Begin ------------------------
0.78 MB transferred in 5864 responses (1xx 0, 2xx 4900, 3xx 0, 4xx 964, 5xx 0)
160 Images (0.16 MB),
857 Content pages (0.62 MB),
4847 Other (0.00 MB)
Requests with error response codes
404 Not Found
/%E2%80%98planeat%E2%80%99-film-explores-l ... greenfudge-org/: 1 Time(s)
/10-foods-to-add-to-the-brain-diet-to-help ... -function/feed/: 1 Time(s)
/10-ways-to-reboot-your-body-with-healthy- ... s-and-exercise/: 1 Time(s)
/bachmann-holds-her-ground-against-raising ... com-blogs/feed/: 1 Time(s)
/behind-conan-the-barbarians-diet/: 1 Time(s)
/tag/dietitian/: 1 Time(s)
/tag/diets/page/10/: 1 Time(s)
/tag/directory-products/feed/: 1 Time(s)
/wp-content/uploads/2011/06/1309268736-49.jpg: 1 Time(s)
/wp-content/uploads/2011/06/1309271430-30.jpg: 1 Time(s)
/wp-content/uploads/2011/06/1309339847-35.jpg: 1 Time(s)
my note here: there are really a lot of these kind of requests like above and I pasted just a few because of clarity.
A total of 12 ROBOTS were logged
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 2 Time(s)
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) 5 Time(s)
Twitterbot/1.0 1 Time(s)
Mozilla/5.0 (compatible; AhrefsBot/2.0; +http://ahrefs.com/robot/) 4 Time(s)
Sosospider+(+http://help.soso.com/webspider.htm) 3 Time(s)
msnbot/2.0b (+http://search.msn.com/msnbot.htm)._ 1 Time(s)
Mozilla/5.0 (compatible; MJ12bot/v1.4.2; http://www.majestic12.co.uk/bot.php?+) 1 Time(s)
msnbot-media/1.1 (+http://search.msn.com/msnbot.htm) 77 Time(s)
Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com) 1 Time(s)
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) 17 Time(s)
Baiduspider+(+http://www.baidu.com/search/spider.htm) 11 Time(s)
Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly/) Gecko/2009032608 Firefox/3.0.8 1 Time(s)
---------------------- httpd End -------------------------
So, I'm thinking this is somekind of a bot (and potentialy one of the listed ones above), so can you please direct me on how could I prevent them from guessing the links in hope of finding content?
edit: since i own a VPS server, there are a lot of domains on it. Can you tell me how can I know on which domain particular 404 happened? Like this line for example: /tag/dietitian/