Sed/Awk save text between patterns if contains string

2

I'm facing an issue with mails. I need to get all messages between 2 people: somebody1@domain.com and person@domain.com.

The file:

From: somebody1@domain.com
to: person@domain.com
<body of the message1>

From: somebody2@domain.com
to: person@domain.com
<body of the message1>

From: somebody1@domain.com
to: person@domain.com
<body of the message1>

From: somebody3@domain.com
to: person@domain.com
<body of the message1>

From: somebody5@domain.com
to: person@domain.com
<body of the message1>

I tried to use the following sed:

sed -n "/From: [Ss]omebody1/,/From: /p" inputfile > test.txt

As a result I got all mails from somebody1 to test.txt file.

Question is: What should be the structre of sed to get only mails between somebody1 and person?

wtk

Posted 2015-10-13T10:55:30.780

Reputation: 21

Answers

1

With sed:

sed -n '/^From: somebody1@domain.com/{h;n;/^to: person@domain.com/{H;g;p;:x;n;p;s/.//;tx}}' file

  • /^From: somebody1@domain.com/: first search for the From: email-address
    • h; store that line int the hold space.
    • n; load the next line (the to: line).
  • /^to: person@domain.com/: search for the to: email-address
    • H; append that line to the hold space.
    • g; copy the hold space to the pattern space.
    • p; print the pattern space.
    • :x; set a label called x.
    • n; load the next line (the email body)
    • p; print that line.
    • s/.// do a substitution in that line (just replace one character)...
    • tx ... that the t command can check if that substitution is successful (when the line is not empty, as in the end of the email body). If yes jump back to the label x and repeat until an empty line appears, if not jump to the end of the script.

The output:

From: somebody1@domain.com
to: person@domain.com
<body of the message1>

From: somebody1@domain.com
to: person@domain.com
<body of the message1>

chaos

Posted 2015-10-13T10:55:30.780

Reputation: 3 704

Probably you can obtain an output more clean without the first p;. Just to avoid a list of isolated matches with From: somebody1@domain.com not followed by the second person match and the block of the letter. – Hastur – 2015-10-13T12:58:15.410

@Hastur Good hint, I corrected it, now It's not printing isolated matches anymore – chaos – 2015-10-13T13:11:26.577

Thanks a lot for that. I would like to ask another question: thing is that what i should get in return is whole message body (which may containt new line characters) till next occurence of "From:" Right now i get more info but it's not enough:

example output From: somebody1@domain.com To: person@domain.com Date: Mon, 06 Jul 2015 17:41:03 GMT Subject: *************** Content-type: ********************************* X-Scanned-By: **********************

and no body after it – wtk – 2015-10-13T13:31:47.597

Search for your file the point in which it stops your chunk, and probably you will find another time the keyword From: somebody1@domain.com... You have to select a different unique key that you will not find again in the body of your message. It will be the same with the awk answer. Give it a try too. – Hastur – 2015-10-13T15:09:50.557

0

With awk:

awk '/From: [Ss]omebody1/{flag=1;next} \
  /to\: person1/ {if (flag>0) {flag=2; print; next} else {flag=0; next}} \
 /From/{flag=0} {if (flag==2){print NR,flag, $0}} ' input.txt 
  • /From: [Ss]omebody1/{flag=1;next} \ Put a flag variable to 1 on match and skip the line.
  • /to\: person1/ If the flag is 1 update it to 2 else reset it to 0.
  • /From/{flag=0} On match it reset the flag value.
  • {if (flag==2){print NR, $0}} if flag is 2 it will print the linenumber and the line.

Change the value of person1 to have different matches.

Input file used

From: somebody1@domain.com
to: person2@domain.com
<body of the message1>

From: somebody2@domain.com
to: person1@domain.com
<body of the message2>

From: somebody1@domain.com
to: person1@domain.com
<body of the message3>

From: somebody1@domain.com
to: person1@domain.com
<body of the message4>

From: somebody3@domain.com
to: person@domain.com
<body of the message5>

Hastur

Posted 2015-10-13T10:55:30.780

Reputation: 15 043