This morning I noticed a single IP-address was kinda crawling my website, though it was querying the same page many times in a few minutes. Then I noticed that it was doing that with different user-agents.
I decided to check what was going on by analyzing the Apache httpd logs
cut -d' ' -f1 /var/log/apache2/*access.log | # Extract all IP-addresses from the server logs
sort -u | # List every IP-address only once
while read ip; do # Cycle through the list of IP-addresses
printf "$ip\t"; # Print the IP-address
grep ^$ip /var/log/apache2/*access.log | # Select log entries for an IP-address
sed 's/^.*\("[^"]*"\)$/\1/' | # Extract the user-agent
sort -u | # Create a list of user-agents
wc -l; # Count the unique user-agents
done |
tee >( cat >&2; echo '=== SORTED ===' ) | # Suspense is killing me, I want to see the progress while the script runs...
sort -nk2 | # Sort list by number of different user agents
cat -n # Add line numbers
Which results in a long list:
line IP-address number of different user-agents used.
...
1285 176.213.0.34 15
1286 176.213.0.59 15
1287 5.158.236.154 15
1288 5.158.238.157 15
1289 5.166.204.48 15
1290 5.166.212.42 15
1291 176.213.28.54 16
1292 5.166.212.10 16
1293 176.213.28.32 17
1294 5.164.236.40 17
1295 5.158.238.6 18
1296 5.158.239.1 18
1297 5.166.208.39 18
1298 176.213.20.0 19
1299 5.164.220.43 19
1300 5.166.208.35 19
So there are tens of IP-addresses that are fiddling with the user agent over a span of a couple minutes. I checked the top 50 IP-addresses against my private little log of known bots, but no matches there.
This is what the access log looks like for a single IP address (vertically and horizontally truncated for readability):
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1; rv:40.0) Gecko/20100101 Firefox/40.0"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.1; rv:40.0) Gecko/20100101 Firefox/40.0"
"GET / HTTP/1.0" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.99 Safari/537.36"
Are other people seeing this? Anyone a clue what is happening here?