How to extract e-mail or domain from mixed data file in linux

1

File content:

17541   From Email      subscription@test.com      Inbound
Policy Manager  Envelope Analysis
Profiler
17541   From Email      subscription@yahoo.com      Inbound
Policy Manager  Envelope Analysis
Profiler
17541   From Domain      test.co.uk      Inbound
Policy Manager  Envelope Analysis
Profiler
17541   From Domain      yahoo.co.uk      Inbound
Policy Manager  Envelope Analysis
Profiler
17541   From Email      subscription@test.com      Inbound
Policy Manager  Envelope Analysis
Profiler

I use that command to extract e-mails and convert them to the new format,but I'm not able to extract domains. I use "sort -u" because some mails are duplicate in the file.

cat 1| grep -E -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b"|sed -e 's/^/E,/'|sort -u

Expected view after grep domains:

test.co.uk
yahoo.co.uk

Kalin Borisov

Posted 2012-10-16T07:20:39.967

Reputation: 123

What kind of output are you expecting? – Bernhard – 2012-10-16T07:30:00.373

I get this on three separate lines when I try E,subscription@test.com E,subscription@yahoo.com E,subscription@test.com . How does that differ from what you want? – Nifle – 2012-10-16T07:34:39.730

I edit my requested. – Kalin Borisov – 2012-10-16T08:06:26.247

Answers

0

Your grep expression is fine, it's the sed one that doesn't work, change it to:

< 1 grep -Eo '\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b' | sed -e 's/[^@]*@//' | sort -u

Assuming the input file is called 1. You could also do the whole thing with grep:

< 1 grep -Eo '\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b' | grep -Eo '[^@]+$' | sort -u

Thor

Posted 2012-10-16T07:20:39.967

Reputation: 5 178

1

This awk one-liner gives the output that you desire

awk '/From Email/ { if( !match($4,"@") ){ print $4 } }' inputfile

It selects the lines containing 'From Email' and checks whether the fourth column contains a @. You can use match with regular expressions to match it to a domain and not a e-mailaddress if you like.

Bernhard

Posted 2012-10-16T07:20:39.967

Reputation: 1 017

Thank you Bernard but that not work. There is not output. – Kalin Borisov – 2012-10-16T08:30:21.783

I have mistake in the description of the question now is correct. When is domain are present like that "17541 From Domain test.co.uk Inbound" – Kalin Borisov – 2012-10-16T08:33:00.737

And I get solution for my self thank you you answer give me the right direction where is my error. – Kalin Borisov – 2012-10-16T08:36:33.337