How to extract 4th line from 300 text files?

-1

0

I have about 300 emails (gmail) that go like this:

Dear [name]

Order ID:123456789 Purchased by: [name I need]

(blah blah another 26 lines of crap (total of 30 lines))

What I need help is how to save gmail's mails locally into .txt and how to export 4th line from each text file. I can easily run Replace 'Purchased by: ' to remove that and keep only the names from that line in a list, but other than that I have no clue

Any ideas?

Twoopah

Posted 2014-06-08T20:16:54.503

Reputation: 1

What system do you want this to run on? – private_meta – 2014-06-08T20:54:41.820

Use Google's data tools to export the mails into the MBOX format, then use grep -e 'Order ID:\d* Purchased by' and sed 's/Order ID:\d+ Purchased by:\s+//' to extract the names. – Eugen Rieck – 2014-06-08T21:35:11.883

Windows 7, I'll give that a shot – Twoopah – 2014-06-08T23:08:04.963

Answers

1

Export the gmail messages in MBOX format (hint: https://support.google.com/accounts/answer/3024195?hl=en) and save them as messages.txt

Grab GNU Awk (gawk.exe) from http://gnuwin32.sourceforge.net/packages/gawk.htm

Save the following as getnames.awk:

/^Order ID:.*Purchased by:/ {
  sub("^.+ by: ","");
  print;
}

Save the following as names2csv.awk:

/^Order ID:.*Purchased by:/ {
  sub("^.*Order ID:[^0-9]*","");
  sub("[^0-9]*Purchased by: ",",");
  print;
}

Now that you've got the scripts and messages above, this will get you a list of names as a text file:

gawk -f getnames.awk messages.txt > names.txt 

And this will get you order IDs and names as a .CSV file, suitable for opening in your favorite spreadsheet software:

gawk -f names2csv.awk messages.txt > orders.csv

Luno

Posted 2014-06-08T20:16:54.503

Reputation: 141

1

You can extract Data directly from Gmail, parse it and save it to an Excel sheet or to other formats like XML, CSV using MsgExtract.

In your case you should define a TextPart field and use the following regular expression to extract only the name, between the brackets []:

(?s)(?<=(by:.[)).+(?=])

If for instance you want to get the text: “Purchased by: [name I need]” use the following expression:

(?s)Purchased.+]

You can learn more about Regular Expressions in MsgExtract at the following link:

http://docs.maildev.com/article/69-parse-email-data-using-regular-expressions

http://www.maildev.com/msgextract/

(Disclaimer, I am the author of MsgExtract)

jponce

Posted 2014-06-08T20:16:54.503

Reputation: 601