How to correctly extract the files from rar archive that contains non-ASCII characters as folder name or filename on linux

I used rar to list/extract the archive, but the non-ASCII characters became unreadable words.

The 7z could list the folder name/filename with readable characters, however 7z said the unsupported method error when extracting the rar archive.

Kane

Posted 2011-02-11T06:03:17.367

Reputation: 305

1RAR has no designated encoding for filenames. Have fun! runs away laughing maniacally – Ignacio Vazquez-Abrams – 2011-02-11T06:09:59.607

What a pity! Do you know any alternative software for a workaround? – Kane – 2011-02-11T06:14:40.740

2Nope. You just have to find something that will extract the raw filenames and then mash on it with convmv. – Ignacio Vazquez-Abrams – 2011-02-11T06:18:07.023

rar recognizes the non-ASCII characters as weird words, even can't create them in the disk when trying to extract the archive. – Kane – 2011-02-12T05:08:30.510

3I've never had problems with unrar and unknown encodings. Maybe try that instead. – Ignacio Vazquez-Abrams – 2011-02-12T12:40:20.193

wow! unrar works like charm! – Kane – 2011-02-14T03:40:42.740

Answers

Using unrar instead of rar.

This answer is from @Ignacio vazquez-Abrams.

Kane

Posted 2011-02-11T06:03:17.367

Reputation: 305

I got the same problem and worked out a solution some what complicate: You will need to have installed a couple of programs, like unrar and hexedit (or any other with the same functionality), then create a simple bash script that to perform the extraction, in my case the content of such script is:

#!/bin/bash
unrar e -v diccionario-arabe-espanol.rar "Diccionario Arabe espaNol.pdf"

where I replaced the real name of the file I want to have extracted with an "ASCII-version", where the á (accented a: ASCII hex code A0) and ñ (tilde-n: ASCII code A4) were replaced with any other ASCII character (one that will not be replaced by your editor with a multi-byte UTF-8 character). You may use hexedit to find out the file header in order to verify the name of the file you are interested in, there you will see in the hex section the codes used for the troublesome characters.

Save your script and change permissions as usual, then edit it with hexedit (or the binary or hexadecimal editor of your choice) and change the characters of file name to be extracted accordingly, that is, in my example, where the A appears, replace it with the hexcode A0, and replace the N of "espaNol" with hexcode A4, save it and run it, that's it, you will end up with the file extracted, and in my system (linux Ubuntu 9.10) my unrar version (UNRAR 4.00 beta 3 freeware) created the extracted file with the name properly converted to UTF-8

In the future I will create another script to automatically perform all the above steps. Hope it whorks for you.

艾也白

Posted 2011-02-11T06:03:17.367

Reputation: 11