How to convert mbox mail files, as found in Thunderbird dir, to Maildir?

2

1

Several ruminations on this topic can be found on the Internet. None (that is easily found) answers the question though, especially for those not familiar with both formats in detail.

The relevant article on Mozilla wiki notes in its first paragraph: “Thunderbird's maildir implementation allows a single unique filename per email (EML). HOWEVER, note this is NOT full maildir in the sense that most people, particularly linux users or mail administrators, know as maildir.” So, presumably, Thunderbird's stock converter does not offer the proper solution.

Dovecot recommends dsync but it is presumably developed for dovecot specifically. I want to quit using Thunderbird but I don't intend to use Dovecot right now, with its wiki mentioning some “Maildir++”. Dovecot also recommends (ibid.) mb2md.pl with some patches of Dovecot's own. mb2md seems to be what's recommended in general.

However, there are two implementations of mb2md: a sh+Python script and a Perl script. The former is the original implementation, and its page says literally the following about the Perl version: “if you encounter this particular [seemingly absurdly trivial environment-related] issue with my script, give it [the Perl implementation] a try”, and nothing else. It's not clear whether those two scripts operate the same way or even support the same syntax. (Brief inspection suggests it doesn't; why use the same name then?) Python version is reportedly from 2006 which makes it more than 10 years old as of today. It also so happens that the distribution I use (Gentoo) does not seem to have mb2md in its repository. I could install Dovecot and use its converter but this doesn't feel right.

Even though the matter might be trivial (mbox is nothing but string concatenation of eml's, right?), all the above is alarming: for a more than 10 year old format (Maildir), widely considered superior to mbox, there apparently is no standard migration mechanism. I don't want to convert blindly, only to find out later that some data was lost, as the mb2md page warns, or that I can't add more messages to the new Maildir storage without losing consistency, as mentioned in mbsync man page (search for the phrase “native scheme is faster”).

For the record, I intend to use mbsync with the new Maildir storage. Hopefully, the answer would not depend on this.

  1. Will the 10 year old sh+Python mb2md converter work as fine with modern mboxen as it did with 2006 ones?
  2. Thunderbird directory contains Mail, ImapMail, News and Feeds subdirectories, which, in turn, contain other files. Are INBOX files the only mbox files, or I might miss some others?
  3. Should I convert each mbox file with mb2md separately and do I have to somehow manually connect or group them in Maildir storage?
  4. In the past, Thunderbird offered “compacting” folders, whatever that means, and the user said yes. Does it affect the conversion process?
  5. What should I take into account when choosing between different mb2md versions? Assume, for the sake of completeness, that tags, PGP encryption and signatures in various forms were heavily used in Thunderbird.

akater

Posted 2017-01-19T19:38:00.603

Reputation: 131

1"Python version is reportedly from 2006 which makes it more than 10 years old as of today." - I can point to code that exists that is 30 years old and in production use. Does code expire like milk? I am pointing this out because, the age of the program shouldn't matter, unless flat out doesn't work anymore. The fact it doesn't work also woudln't have anything to do with its age, because to be honest, it likely never worked or better stated "worked well". You will have to answer most of these questions for us. – Ramhound – 2017-01-19T19:57:27.950

Script did not expire. Mail could change though. That's why I said “work with mboxen” and not “run”. – akater – 2017-01-19T22:42:45.077

Answers

1

The answer by wbob is useful and detailed. However, I had used a different solution before wbob suggested dovecot conversion. Besides, I ended up with notmuch storage. I have to accept my own answer because that's what got used after all.

I employed a simple Python script making use of mailbox library. (Thanks to notmuch IRC channel on freenode.)

#!/usr/bin/python3
import mailbox
import sys
import os
mbox_filename = sys.argv[1]
maildir_root_dir_name = sys.argv[2]
mbox = mailbox.mbox(mbox_filename, create=False)
mailbox_name = os.path.basename(mbox_filename)
maildir_dir_name = "/".join((maildir_root_dir_name, mailbox_name))
os.mkdir(maildir_dir_name, mode=0o750)
mdir = mailbox.Maildir(maildir_dir_name, create=True)
os.mkdir("/".join((maildir_dir_name, "cur")), mode=0o750)
os.mkdir("/".join((maildir_dir_name, "new")), mode=0o750)
os.mkdir("/".join((maildir_dir_name, "tmp")), mode=0o750)
count = 0
for x in mbox:
    print(x.get_from())
    count += 1
    if count % 1000 == 0:
        print(count)
    mdir.add(x)

Some messages were broken; the script stopped with error and a line number, so I had to use emacs (with vlf to open large files, I believe) to fix the problematic messages in mbox file.

The question deserves more elaborate answer, as it's quite troublesome to accomplish the task for most users. Hopefully, I'll expand this in future.

akater

Posted 2017-01-19T19:38:00.603

Reputation: 131

1

For Thunderbird users, version 60 brought experimental bidirectional mbox to Maildir conversion support. See meta ticket for open issues. Personally I can recommend the dovecot dsync method.

Having recently converted largish Thunderbird mbox folders to Maildir and evaluated your mentioned links, I cannot recommend any of the helper-scripts. There was a 'From:' split one script missed and the message count pre/post migration didn't match, other issues were text-encoding or timestamps with other conversion scripts found on github.

Instead, dsync gave fast (1-2 min on 25k messages) and consistent results, see your mentioned Migration/MailFormat Wiki page: dsync -Dv mirror mbox:~/.thunderbird/<profile/popMail/Account>:INBOX=Inbox. As noted, configure the mail_location=maildir:~/Maildir beforehand. Start with an empty folder and later make it the Account Folder for the Maildir-enabled Thunderbird with some manual cleanup. {cur,new,tmp} in the basedir have to move into "Inbox" and the folder .dot-prefix can be removed. Having a second Maildir-enabled profile will give directions what Thunderbird expects.

wbob

Posted 2017-01-19T19:38:00.603

Reputation: 11

0

While this is a programming question, the practical answer to get data exported from Thunderbird is to use the alternative addon "alternative import/export" and you can export to eml format among others

A Learner

Posted 2017-01-19T19:38:00.603

Reputation: 1