match string in awk

1

1

How can I search the file to find the lines that have SRC= , for example here? i mean how can i find source IP address in this file using awk for example

Mar 10 03:17:12 ubuntu kernel: [11045.721649] Type=ScanXMASIN=eth0 OUT= MAC=00:0c:29:a1:51:1c:00:0c:29:23:9d:e4:08:00 SRC=192.168.1.28 DST=192.168.1.27 LEN=40 TOS=0x00 PREC=0x00 TTL=47 ID=6603 PROTO=TCP SPT=47301 DPT=53 WINDOW=1024 RES=0x00 URG PSH FIN URGP=0 
Mar 10 03:17:12 ubuntu kernel: [11045.721702] Type=ScanXMASIN=eth0 OUT= MAC=00:0c:29:a1:51:1c:00:0c:29:23:9d:e4:08:00 SRC=192.168.1.30 DST=192.168.1.27 LEN=40 TOS=0x00 PREC=0x00 TTL=42 ID=6802 PROTO=TCP SPT=47301 DPT=5900 WINDOW=1024 RES=0x00 URG PSH FIN URGP=0 
Mar 10 03:17:32 ubuntu kernel: [11065.703937] Type=ScanACKIN=eth0 OUT= MAC=00:0c:29:a1:51:1c:00:0c:29:23:9d:e4:08:00 SRC=192.168.1.31 DST=192.168.1.27 LEN=40 TOS=0x00 PREC=0x00 TTL=40 ID=62992 PROTO=TCP SPT=47301 DPT=1521 WINDOW=1024 RES=0x00 URG PSH FIN URGP=0 
Mar 10 03:17:32 ubuntu kernel: [11065.706729] Type=ScanXMASIN=eth0 OUT= MAC=00:0c:29:a1:51:1c:00:0c:29:23:9d:e4:08:00 SRC=192.168.1.32 DST=192.168.1.27 LEN=40 TOS=0x00 PREC=0x00 TTL=47 ID=15170 PROTO=TCP SPT=47301 DPT=14442 WINDOW=1024 RES=0x00 URG PSH FIN URGP=0

and then I'd like to get this output:

192.168.1.28
192.168.1.30
192.168.1.31
192.168.1.32

There are lots of lines (100,000) and i want to search for SRC= and then when i find lines crop SRC= and just find IP address

USING AWK

thank you all! :)

Arash

Posted 2013-03-10T11:03:15.267

Reputation: 678

Des it need to be awk or will gawk be ok? – terdon – 2013-03-10T11:55:59.537

awk is preferred but not impotent at all – Arash – 2013-03-10T12:04:21.583

awk '(/SRC=192.168.1.28/) {print $11}' but i want just ip address – Arash – 2013-03-10T12:05:00.677

Just asking because you can capture matches in gawk with match(). – terdon – 2013-03-10T13:18:59.900

Answers

5

Unfortunately awk does not capture its groups. You might want to look for a more modern tool with which to write one-liners, such as Perl.

That being said, the fastest way to do so in your case depends on whether SRC= is always at the same place in the logs.

If it's always in the same place, and the arguments always contain the same number of equal signs, you can just split your lines on both equals and space and take the 15th field:

awk -F'[= ]' '{print $15}'

Otherwise, for a more robust approach, you can substitute away the part leading to SRC= and the part following it:

awk '{sub(/.* SRC=/, ""); sub(/ .*/, ""); print;}'

If you need to count the occurrences, you could add an idiomatic | sort | uniq -c | sort -rn to the pipeline, but that's inefficient with 100,000s of lines. You are better off using awk's built-in dictionary type for the first two steps:

awk '{sub(/.* SRC=/, ""); sub(/ .*/, ""); ips[$0]++;}
     END {for (ip in ips) printf("%8d  %s\n", ips[ip], ip);}' | sort -nr

The output of either should look like this:

7513  192.168.1.28
 330  192.168.1.30
 103  192.168.1.31
  19  192.168.1.32

Tobia

Posted 2013-03-10T11:03:15.267

Reputation: 330

this works.thank u :* :D but 1 question how i can sai that if 1 ip showed up 3 times echo it in file in new line. – Arash – 2013-03-10T12:12:02.043

I don't understand this last question – Tobia – 2013-03-11T10:53:53.197

4

While this is certainly possible with awk, it's much more straightforward with grep:

grep -Po "(?<=SRC=)[\d.]+"

How it works:

  • -P enables Perl Compatible Regular Expressions.

  • -o only displays the matched part of the line.

  • (?<=SRC=) is a positive look-behind assertion, i.e., the match must be preceded by SRC=.

  • [\d.]+ is any number of digits and dots.

Dennis

Posted 2013-03-10T11:03:15.267

Reputation: 42 934

2

A sed solution (sed is as standard as awk in UNIX systems):

sed -n -e 's/.*SRC=\([^ ]*\).*/\1/p' -e 's/.*SRC=\([^ ]*\)$/\1/p' file

What it does is trying to remove everything before a SRC= and after the next space. When a substitution is done, print the resulting line. The second substitution is needed if the ip address is the last field of the line.

jfg956

Posted 2013-03-10T11:03:15.267

Reputation: 1 021

2

I'd do this with awk:

awk -F '[ =]' '{for (i=1; i<NF; i++) if ($i == "SRC") {print $(i+1); next}}'

glenn jackman

Posted 2013-03-10T11:03:15.267

Reputation: 18 546

2

This pure awk works even if the number of fields changes, as long as the desired IP is preceded by SRC= and followed by a space:

awk -F'SRC=' '{print $2}' a | awk '{print $1}'

This might be more straightforward with gawk which has the match() function which allows you to capture patterns:

gawk 'match($0,/SRC=([0-9.]+)/,k){print k[1]}' a

terdon

Posted 2013-03-10T11:03:15.267

Reputation: 45 216

1

Yet another awk to try that discards the lines that do not contains SRC=:

awk -F'.*SRC=| ' '/SRC=/{print $2}' file

Or try another sed:

sed -n '/.*SRC=/{s///; s/ .*//p;}' file

Scrutinizer

Posted 2013-03-10T11:03:15.267

Reputation: 249