1

There are daily about 3000 404 hits or more from facebook crawler. Log is as

X.X.X.X Y.Y.Y.Y - - [24/May/2017:03:43:35 +0000] "GET /health-and-medicine/trumps-2018-budget-cuts-funding-for-cancer-mental-health-and-hiv-research/ HTTP/1.1" 404 292 "http://m.facebook.com" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Mobile/14E304 [FBAN/FBIOS;FBAV/87.0.0.44.70;FBBV/54482584;FBDV/iPhone8,4;FBMD/iPhone;FBSN/iOS;FBSV/10.3.1;FBSS/2;FBCR/Sprint;FBID/phone;FBLC/en_US;FBOP/5;FBRV/55128799]"
X.X.X.X Y.Y.Y.Y - - [23/May/2017:03:19:40 +0000] "GET /environment/mount-everests-famous-hillary-step-destroyed-by-2015-nepal-earthquake/ HTTP/1.1" 404 280 "http://m.facebook.com/" "Mozilla/5.0 (Linux; Android 5.1.1; LGL82VL Build/LMY47V; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/43.0.2357.121 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/111.0.0.18.69;]"

Need to know how these hits can be blocked? These hits are not from single IP, single subnet range or at single path.

Also there is not any kind of facebook integration in application.

Edit: I added one more log example because some may confuse that it coming from ios

YATIN GUPTA
  • 203
  • 1
  • 2
  • 9
  • Why do you think this is from a Facebook crawler? From the user agent, it's an iPhone. – EEAA May 24 '17 at 12:02
  • Yes, it is facebook ios app and also note FBAN/FBIOS, all these abbrv are facebook abbrv in user agent. Moreover also note referer in log. – YATIN GUPTA May 24 '17 at 12:46

1 Answers1

1

Sorry, but you're mistaken.

This is not a Facebook crawler. Rather, this log was produced by the Facebook Mobile application (the logs provided indicate iOS and Android), fetching an article from your server.

EEAA
  • 108,414
  • 18
  • 172
  • 242
  • How to eliminate them or block them is my point – YATIN GUPTA May 24 '17 at 13:14
  • Right, and those would be from the Android iOS application. You've provided absolutely no evidence that this is in any way related to Facebook's crawler. On the contrary, the one log entry you've provided is clearly from the Facebook iOS app. – EEAA May 24 '17 at 13:16
  • Why do you want to eliminate them? They're just 404s, and at the rate you're talking about, they're surely not causing performance problems. All major web servers have the ability to block based on user agent, so you could research that. Keep in mind, though, that doing this will also block non-404 requests from these clients as well. – EEAA May 24 '17 at 13:37