0

I have Lighttpd on an Ubuntu server. I just checked the access logs from lighttpd for a particular domain. This domain only has a very simple index.html file that basically says "coming soon". Below are 10 of the most recent ones. I don't completely understand this. Why are search engine bots trying these weird subdomains and URLs? I've found the following bots doing weird things: mail.ru, bing, baidu. Google and Yahoo are not found in the logs. I've changed the domain to example.com to protect it of course.

217.69.133.239 power-steering-pump-ford.example.com - [31/Dec/2014:05:17:37 -0500] "GET /robots.txt HTTP/1.1" 404 345 "-" "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)"
217.69.133.240 power-steering-pump-ford.example.com - [31/Dec/2014:05:17:39 -0500] "GET /bedroom-boy-furniture-quality.html/ HTTP/1.1" 404 345 "-" "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)"
217.69.133.238 power-steering-pump-ford.example.com - [31/Dec/2014:05:17:44 -0500] "GET /10-car-hottest-top.html/ HTTP/1.1" 404 345 "-" "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)"
157.55.39.173 best-mixed-drink-recipes.example.com - [31/Dec/2014:05:26:43 -0500] "GET / HTTP/1.1" 200 187 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
217.69.133.234 cannon-printer-model-mp450.example.com - [31/Dec/2014:05:31:49 -0500] "GET /robots.txt HTTP/1.1" 404 345 "-" "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)"
217.69.133.240 cannon-printer-model-mp450.example.com - [31/Dec/2014:05:31:50 -0500] "GET / HTTP/1.1" 200 187 "-" "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)"
217.69.133.240 smart-car-bike-rack.example.com - [31/Dec/2014:05:31:52 -0500] "GET /robots.txt HTTP/1.1" 404 345 "-" "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)"
217.69.133.238 smart-car-bike-rack.example.com - [31/Dec/2014:05:31:54 -0500] "GET / HTTP/1.1" 200 187 "-" "Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)"
202.46.53.179 winter-clothing-for-kids.example.com - [31/Dec/2014:05:52:05 -0500] "GET / HTTP/1.1" 200 230 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
180.76.4.195 winter-clothing-for-kids.example.com - [31/Dec/2014:05:52:47 -0500] "GET / HTTP/1.1" 200 230 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
Ray Walz
  • 135
  • 6
  • As you can see, the domain is set up so that all subdomains resolve. However, this still doesn't explain why search engines would be generating these subdomains. How are they being generated and by who? – Ray Walz Dec 31 '14 at 11:28
  • While I've owned this domain for 5+ months. I just found out it was previously registered back in 2010-2011. Perhaps it was a spam site and search engines are simply finding links and following them? – Ray Walz Dec 31 '14 at 11:59
  • Can you check the domain's previous content at http://web.archive.org/, or a search engine's cache? That might show you what sort of site it was. Or maybe look for it in spam blocklists (eg SpamHaus DBL) or web reputation dbs (eg mywot.com). That's possibly worth doing anyway, as you might run into problems being blacklisted later on. – Adam Thompson Dec 31 '14 at 12:17
  • @AdamThompson I tried archive.org, but there was only one copy and it was a page saying the site was down for maintenance. I'll try the other stuff you recommended in the morning, thanks. – Ray Walz Dec 31 '14 at 12:23

1 Answers1

1

This problem appears to simply be caused by backlinks from when the domain was previously owned. The problem seems to be worsened because my server is set so that any subdomain returns 200 (no errors).

To solve this problem. I will change configuration so that erroneous subdomains return 404, and maybe report the false links to the search engines that indexed them.

Sorry for using ServerFault as a rubber duck and thank you for your downvotes.

Ray Walz
  • 135
  • 6