1

I've got a Kerio Connect mail server that saves all of it's emails in a relatively standard mail spool folder structure as raw .eml files. I've been tasked with performing a keyword search against some of the user's mailboxes for keywords and email addresses. I then need to copy the found files to another folder.

The folder structure will be something like:

mail
  example.com
    user1
      INBOX
      Sent Items
      etc
    user2
      INBOX
      etc

The difficulty arises that the emails, as .eml files, are named in each folder with a serial number - so if I find an email in user1's inbox that's called 00000123.eml and another email in their Sent Items with the same name has one of the keywords in it, I don't want one to copy over the top of the other.

I also need to perform some of the keyword searches with case insensitivity so that I can search for "keyword" and have it match Keyword and keyword and KEYWORD.

I think the following command will do what I want it to do, but I'm not 100% sure and I'm running this over ~100 GB of eml files, so I want to make sure it's all correct before leaving it to run.

grep -i -r -l -e "user1@example.com|anotheruser@example.com|keyword1|anotherkeyword|evenmore" /usr/local/kerio/mailserver/store/mail/example.com/user1/ | xargs -I{} rsync -Rv {} /Volumes/Data/Email\ Discovery/201706/user1/

By my count, this will do a case-insensitive search (-i) recursively (-r) print just filenames (-l) and use the regex (-e) and then pass the results to rsync which will copy them recursively to a destination folder and (hopefully) keep the same folder structure.

Is there a more efficient way to do this?

Kai Howells
  • 83
  • 1
  • 5
  • Try search function inside Kerio Webmail client... – Anubioz Jun 29 '17 at 03:47
  • That doesn't really help much - you can only search for one term at a time and then there's no way to export the emails as .eml files for archiving. – Kai Howells Jun 29 '17 at 07:02
  • There is no better way to do what you need, since like 90% of emails are base64 encoded, which makes grep useless... – Anubioz Jun 29 '17 at 08:28
  • @Anubioz: It should be possible to decode before grepping. Of course, that makes the job a little bit more difficult, but not impossible. Time to get rid of the one liner though and write a proper script for this ... – Sven Jun 29 '17 at 08:41
  • I wouldn't say that 90% of emails are base64 encoded - in my testing the vast majority of emails are not encoded. Sure the attachments and embedded images are base64 encoded, but there's nearly always a plain text and an html text representation of the email. Most attachments (for this law firm at least) are scanned documents, as opposed to pdfs with actual text objects inside them, but I'm not concerned about looking into them... – Kai Howells Jun 29 '17 at 10:33
  • Try Kerio Outlook Connector, it should allow easy searching and Microsoft Outlook supports saving EMLs... – Anubioz Jun 29 '17 at 13:12
  • I'm also aware of KOC. Yes, it does allow searching and Outlook does allow you to save .eml files but this also doesn't really help as it's still a manual solution. Thanks for your suggestions. – Kai Howells Jun 29 '17 at 22:02

0 Answers0