I've got a Kerio Connect mail server that saves all of it's emails in a relatively standard mail spool folder structure as raw .eml files. I've been tasked with performing a keyword search against some of the user's mailboxes for keywords and email addresses. I then need to copy the found files to another folder.
The folder structure will be something like:
mail
example.com
user1
INBOX
Sent Items
etc
user2
INBOX
etc
The difficulty arises that the emails, as .eml files, are named in each folder with a serial number - so if I find an email in user1's inbox that's called 00000123.eml and another email in their Sent Items with the same name has one of the keywords in it, I don't want one to copy over the top of the other.
I also need to perform some of the keyword searches with case insensitivity so that I can search for "keyword" and have it match Keyword and keyword and KEYWORD.
I think the following command will do what I want it to do, but I'm not 100% sure and I'm running this over ~100 GB of eml files, so I want to make sure it's all correct before leaving it to run.
grep -i -r -l -e "user1@example.com|anotheruser@example.com|keyword1|anotherkeyword|evenmore" /usr/local/kerio/mailserver/store/mail/example.com/user1/ | xargs -I{} rsync -Rv {} /Volumes/Data/Email\ Discovery/201706/user1/
By my count, this will do a case-insensitive search (-i) recursively (-r) print just filenames (-l) and use the regex (-e) and then pass the results to rsync which will copy them recursively to a destination folder and (hopefully) keep the same folder structure.
Is there a more efficient way to do this?