File names garbled in rar archive, possibly double utf8 encoded? Can't figure out how to reverse

1

I'm trying to restore the uploads folder of a wordpress installation. The folder went through some combination of ftp or sftp transfer and got compressed by rar. (I don't know the exact process, it wasn't done by me). Now they are garbled in the archive.

For example, the fallowing file

gerendás.jpg

is named

gerendăľs.jpg

in the archive.

I can't figure out the exact process that took place. Somehow I think it got encoded utf8 multiple times. The closest I got to reproducing it was

~ $ convmv --nosmart -f "iso-8859-2" -t "utf8" gerendás.txt
Starting a dry run without changes...
mv "./gerendás.txt" "./gerendĂĄs.txt"
No changes to your files done. Use --notest to finally rename the files.

that is, the first character is the uppercase version of ă. I'm out of ideas here.

What could have caused this and how can I revert it?

proto-n

Posted 2014-12-21T21:46:42.030

Reputation: 111

The first assumption would be a corrupted archive, or corrupted transfers. if you have a bunch of .jpgs did you check to see first if the pictures all show properly? Did you first ask what method (&program) was used to compress and attempt to decompress using the exact same program? Was there any checksums or MD5s made, or that can be made now to verify the transfer of the archive? – Psycogeek – 2014-12-21T22:35:58.033

1Yes, the contents of the files are intact, the jpegs display perfectly. Only the filenames are garbled, and only the accented characters. Thing is, I can't attemt to decompress using the original program, which is probably winrar, since I only have access to linux for now. I have tried unrar and p7zip both though. – proto-n – 2014-12-21T22:45:42.677

Also, the wierd characters seem to be consistent. Every instance of 'ăľ was an 'á' originally. That of course leves me with the possibility of trying to figure out the mapping and doing a simple replace, but I'd like to avoid that if possible. – proto-n – 2014-12-21T23:02:53.653

1How are you examining the contents of the archive? If the archive contains a UTF-8 file name but you are looking at it in a legacy encoding, that's exactly the result you'll get. – tripleee – 2015-03-02T08:01:02.073

No answers