0

I have a standard apache error log file. I would like to see what URLs are causing 404s, since I have moved this site around and I want to find bad links. Can anyone recommend a bash snippet that will parse this log using awk or something to show me the popular 404s?

I know there are advanced programmes for this sort of thing. I'm just looking for something simple.

Amandasaurus
  • 30,211
  • 62
  • 184
  • 246

2 Answers2

6

This should do it:

grep ' 404 ' /var/log/apache2/access.log | cut -d ' ' -f 7 |sort |uniq -c |sort -n
womble
  • 95,029
  • 29
  • 173
  • 228
  • Add a "| sort -n" to get them in order of hits – David Pashley Jun 30 '09 at 10:18
  • I agree with this solution with one important side note: This works with most Apache installations that use the system default logging format. If the admin has changed the logging format the command above must be altered slightly. – Niels Basjes Jun 30 '09 at 12:35
3

An awk answer :

awk '$9 == 404{urls[$7]++}END{for (url in urls) print urls[url] "\t" url}' access_log | sort -n

It's just for fun as it's probably much much slower than womble solution

radius
  • 9,545
  • 23
  • 45