Tool to fix file names starting with =_iso-8859-1... in .eml file names?

2

0

I have a folderful of E-Mails saved from an IMAP account that I dissolved.

The file name is the subject line of each E-Mail.

Now unfortunately, when a non-ASCII encoding is used, the subject line will look like they look internally - they will be prefixed with =_ and the encoding used:

=_UTF-8_Q_Auftragsbest=C3=A4tigung_(Kundennummer__)_=_20100819_150312_37.eml

=_windows-1252_Q_Best=E4tigung=3A_Wir_haben_Ihre_=_20100819_150310_28.eml 

Does anybody know a tool that could be used to mass fix this on filesystem level?

A solution would have to 1. remove the =_ENCODING prefix and 2. if at all possible, convert the encoded characters in the file name to their proper filesystem equivalent Umlauts.

I'm on Windows 7 or XP, but I'd be ready to take this to a Linux VM because it is a big folder and an automated solution would be great.

Pekka

Posted 2010-09-12T21:19:40.953

Reputation: 2 239

Do you want to remove the reference to the encoding used in the filename ? I'm not sure what the eventual goal is. Perhaps use Bulk Rename ( http://www.bulkrenameutility.co.uk/Main_Intro.php ) as mentioned in Mass Rename Files @Pekka

– Sathyajith Bhat – 2010-09-12T21:52:12.010

Basically you need something like "convmv" with support for quoted-printable. – None – 2010-09-12T22:16:06.997

Answers

1

I built myself a PHP script. I thought I'd share it in the event somebody else ends up with a similar problem. It works for me and the encodings I needed (you may have to extend the encodings array).

The script converts MIME encoded file names recursively throughout the specified directory structure into UTF-8.

It doesn't produce entirely perfect results: There are several special characters that get doubly converted, or not at all. As far as I can see, this is the fault of the IMAP exporter or incorrect encoding info inside the E-Mail itself.

mb_decode_mimeheader() is the heart of the whole thing.

Released to the public domain; no warranty whatsoever. PHP 5.2 is required.

It should run on both CLI and through the web; I tested it in the browser.

Make backups before running scripts like this on your data.

<?php

 /* Directory to parse */
 $dir = "D:/IMAP";
 /* Extensions to parse. Leave empty for none */
 $extensions = array("eml");
 /* Set to true to actually run the renaming */
 define ("GO", true);

 /* No need to change past this point */  

 /* Output content type header if not in CLI */
 if (strtolower(php_sapi_name()) != "CLI")
  header("Content-type: text/plain; charset=utf-8");


 $FixNames = new FixEmlNames($dir, $extensions);
 $FixNames->fixAll();



  class FixEmlNames
   {

     /* List of possible encodings here */
     private $encodings = array("iso-8859-1", "iso-8859-15", "windows-1252", "utf-8");
     /* Encoding Prefix. The exporter exports e.g. =_iso-8859-1_ with underscores 
        instead of question marks */
     private $encoding_prefix = "=_";
     /* Encoding postfix */
     private $encoding_postfix = "_";
     /* Temporary storage for files */
     private $files;
     /* Array of file extensions to process. Leave empty to parse all files and directories */
     private $extensions = array(); 
     /* Count of renamed files */
     private $count = 0;
     /* Count of failed renames */
     private $failed = 0;
     /* Count of skipped renames */
     private $skipped = 0;
     /* Transform forbidden characters in host OS */
     private $transform_characters = array(":" => "_", "?" => "_", ">" => "_");

     function __construct($dir, $extensions = array("eml"))
       { 

        $this->files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($dir));
        $this->extensions = $extensions;
       }

     function fixAll()
      {
        echo "Starting....\n";

        while($this->files->valid())
        {
         if (!$this->files->isDot())
          {

           $path = $this->files->key();
           $ext  = pathinfo($path, PATHINFO_EXTENSION);

           if ((count($this->extensions) == 0 ) or (in_array($ext, $this->extensions)))
            $this->renameOne($path);

          }
         $this->files->next();
        }

        echo "Done. ";

        /* Show stats */
        $status = array();

        if ($this->count > 0) $status[] = $this->count." OK";
        if ($this->failed > 0) $status[] = $this->failed." failed";
        if ($this->skipped > 0) $status[] = $this->skipped." skipped";

        echo implode(", ", $status);


      }

      function  renameOne($fullPath)
       {


          $filename = pathinfo($fullPath, PATHINFO_BASENAME); 
          $is_mime = false;

          // See whether file name is MIME encoded or not
          foreach ($this->encodings as $encoding)
           { if (stristr($filename, $this->encoding_prefix.$encoding.$this->encoding_postfix))
              $is_mime = true;
           }

           // No MIME encoding? Skip.
           if (!$is_mime)
            {
              # uncomment to see skipped files
              # echo "Skipped: $filename\n";
              $this->skipped++;
              return true;
            }

           mb_internal_encoding("UTF-8"); 
           $filename = str_replace("_", "?", $filename);  // Question marks were converted to underscores
           $filename = mb_decode_mimeheader($filename);
           $filename = str_replace("?", "_", $filename);  


           // Remove forbidden characters
           $filename = strtr($filename, $this->transform_characters);

          // Rename
          if (constant("GO") == true)
           {
            // We catch the error manually
            $old = error_reporting(0);
            $success = rename($fullPath, realpath(dirname($fullPath)).DIRECTORY_SEPARATOR.$filename); 
            error_reporting($old);        

          if ($success)
           {
           echo "OK: $filename\n";
           $this->count++; 
           return true;
           }
          else
           {
             $error = error_get_last();
             $message = $error["message"];
             $this->failed++;
             echo "Failed renaming $fullPath. Error message: ".$message."\n";
             return false;
           }
           }
           else
           {
             $this->count++;  
             echo "Simulation: $filename\n";
             return true;   
           }
       }


   }

Pekka

Posted 2010-09-12T21:19:40.953

Reputation: 2 239

1

Since you're willing to move to linux, you could install a php server on it and make a fairly easy script to reencode the files. Degree of difficulty depends on if you've ever done any programming. You can reference these functions on php.net

These are the functions you would need

<?php

opendir  ( string $path  [, resource $context  ] )
readdir  ([ resource $dir_handle  ] )
file_get_contents(ENTER THE FILE NAMES HERE WITH A VARIABLE PASSED FROM readdir)
preg_replace(REGULAR EXPRESSION TO REMOVE THE =ENCODING part of the filename)
string mb_convert_encoding  ( string $str  , string $to_encoding  [, mixed $from_encoding  ] )
file_put_contents(THE NEW FILE NAME.eml)

?>

JMC

Posted 2010-09-12T21:19:40.953

Reputation: 133

Cheers, I actually am a PHP programmer and will go down that route. However, decoding the characters is more tricky than just doing an mb_convert_encoding on them: It'll take something like mb_decode_mimeheader, still working on understanding how this stuff works. I'll post the solution as an answer when I'm done. – Pekka – 2010-09-13T09:04:29.563

No upvote just because one does not need neither Linux nor a "PHP server" to run PHP scripts. – user1686 – 2010-09-13T13:16:28.933

@Pekka - I'm Not high enough rep to post comment on your posted answer. Have you tried iconv() with TRANSLIT option to fix the characters that aren't encoding properly? $output = iconv("ISO-8859-1", "UTF-8//TRANSLIT", $input); It might work – JMC – 2010-09-13T18:35:53.187

@grawity - I've never run php scripts without a php server, how do you do it? Tried googling, but didn't come up with anything on the first few results. I mentioned linux because he said he was willing to move to it in his question and apache + php is easy to install on most linux distros. – JMC – 2010-09-13T18:40:50.570

@JMC the problem is garbled encodings that started much earlier than the export process. Therefore, it's impossible to know what is the start encoding - you could use mb_detect_encoding but it's not 100% reliable. That would be the next step for the script if somebody wants to take it further - I'm happy with how it does things now. – Pekka – 2010-09-14T09:40:56.363

@JMC ...you run php myscript.php or ./myscript.php on a command line, exactly the same way one would run Perl or Python programs. – user1686 – 2010-09-14T15:42:15.227

@JMC Also, there is no such a thing as a "PHP server". There are only web (HTTP) servers, that can execute CGI programs (PHP or whatever). – user1686 – 2010-09-14T15:43:24.773

@JMC Also, http://windows.php.net/

– user1686 – 2010-09-15T11:30:37.820