How to convert mboxo/mboxrd to mboxcl/mboxcl2

2

1

I am trying to export email from thunderbird so that I can read it in mutt. I have started by exporting into mbox format using the ImportExportTools thunderbird add on. I then copied the file to the server, but mutt told me there were no messages in the file.

After some more research it appears that there are several variants of mbox. The exported file appears to be either mboxo or mboxrd - at any rate I found a >From in the text of the file, and there are no Content-Length headers (as there would be in an mboxcl/mboxcl2 file).

Now according to the link above on the variants of mbox: "The mutt MUA attempts to convert "mboxo" and "mboxrd" mailboxes to "mboxcl" format." But this has obviously not happened in this case.

So does anyone know how to convert mboxo/mboxrd into mboxcl? Are there any tools available? Or am I going to have to write some code to do this ...

Edited to add: I exported from Thunderbird 3.0 using ImportExportTools 2.3.1.1. I have tried using mutt 1.5.20 (on Ubuntu 9.04) and 1.5.18 (on Debian Lenny).

Hamish Downer

Posted 2010-02-06T19:22:55.320

Reputation: 3 064

Answers

1

You could try this script. I found I needed to massage some mbox files I downloaded from Mailman-type mailing list archives to get them into a format mutt recognized. I think it's pickiest about the date format. I've not yet encountered an easier fix. But this works for me.

#!/usr/bin/env python
"""
Usage:   ./mailman2mbox.py  infile outfile default-to-address
"""
import sys
from time import strftime,strptime,mktime,asctime
from email.utils import parseaddr,formatdate

if len(sys.argv) not in (3,4):
    print __doc__
    sys.exit()

out = open(sys.argv[2],"w")
listid = None
if len(sys.argv)==4:
    listid = sys.argv[3]

date_patterns = ("%b %d %H:%M:%S %Y", "%d %b %H:%M:%S %Y", "%d %b %Y %H:%M:%S", "%d %b %H:%M:%S",  "%d %b %y %H:%M:%S", "%d %b %Y %H.%M.%S",'%m/%d/%y %H:%M:%S %p')

class HeaderError(TypeError):
    pass


def finish(headers, body):
    body.append("\n")
    for n,ln in enumerate(headers):
        if ln.startswith("Date:"):
            break
    else:
        raise HeaderError("No 'Date:' header:\n" + "".join(headers)+"\n")
    if listid is not None:
        for ln2 in headers:
            if ln2.lower().startswith("list-id:"):
                break
        else:
            headers.append("List-Id: <%s>\n" % (listid,))
    date_line = ln[5:].strip()
    if date_line.endswith(')'):
        date_line = date_line[:date_line.rfind('(')].rstrip()
    if date_line[-5] in "+-":
        date_line, tz = date_line[:-5].rstrip(), int(date_line[-5:])//100
    else:
        tz = -5
    if date_line[:3] in ("Mon","Tue","Wed","Thu","Fri","Sat","Sun"):
        if date_line[3:5] == ', ':
            prefix = "%a, "
        elif date_line[3] == ',':
            prefix = "%a,"
        else:
            prefix = "%a "
    else:
        prefix = ""
    while True:
        for p in date_patterns:
            try:
                date_struct = strptime(date_line, prefix+p)
            except ValueError:
                pass
            else:
                break
        else:
            if not date_line:
                raise ValueError(headers[n])
            date_line = date_line[:date_line.rfind(' ')]
            continue
        break

    date_struct = list(date_struct)
    try:
        headers[n] = 'Date: %s\n' % (formatdate(mktime(date_struct),True))
        headers[0] = "%s %s\n" % (headers[0][:-25].rstrip(), asctime(date_struct), )
    except ValueError:
        raise ValueError(headers[n])

    for w in headers, body:
        for s in w:
            out.write(s)


message = 0
headers, body = None, None
for line in open(sys.argv[1]):
    if line.startswith("From "):
        message+=1
        header = True
        if headers is not None:
            try:
                finish(headers, body)
            except HeaderError:
                message -= 1
                out.write('>')
                for w in headers, body:
                    for s in w:
                        out.write(s)
        headers, body = [], []
        line = line.replace(" at ", "@")
    elif line == '\n':
        header = False
    elif header and line.startswith('From:'):
        line = line.replace(" at ","@")
    (headers if header else body).append(line)

try:
    finish(headers, body)
except HeaderError:
    out.write('>')
    for w in headers, body:
        for s in w:
            out.write(s)

out.close()

dubiousjim

Posted 2010-02-06T19:22:55.320

Reputation: 1 128

Thanks for trying, but I'm afraid it has not fixed my problem :( – Hamish Downer – 2010-02-17T17:26:12.017

Pity. You could try manufacturing Content-Length headers. I agree this should be more straightforward though. – dubiousjim – 2010-02-17T18:42:00.053