Vim shows strange characters <91>,<92>

27

8

While using Vim over SSH I copied some content from a webpage to my SSH/Vim session and got the following result:

SIZE=`df -h|grep $DISC|awk <91>{print $2}<92>`

Apparently <91> and <92> stand for ' but how can I search and replace this stuff? And what does that 91/92 mean? How is this encoded because 91/92 in ASCII mean \ and [?

Jeremy S.

Posted 2010-10-15T13:33:05.217

Reputation: 419

Answers

24

The content on your source web page was overzealously reformatted. The text was undoubtedly supposed to use (straight) single quotes (ASCII 39/0x27, U+0027) instead of curly single quotes (U+2018 and U+2019, which are 0x91 and 0x92 in CP1252 (also known as MS-ANSI and WINDOWS-1252; a common 8-bit encoding on Windows)).

Vim is showing you the hex codes because they are not valid in whatever encoding Vim is using (probably UTF-8). If you are editing text that has already been saved in a file, then you can reload the file as CP1252 with :e ++enc=cp1252; this should make the curly quotes visible. But there is no real reason to reload it as CP1252, just delete the 0x91 and 0x92 characters and replace them with single quotes.

Chris Johnsen

Posted 2010-10-15T13:33:05.217

Reputation: 31 786

@ChrisJohnsen, Is there any way to call vi with a flag that accomplishes the same thing as :e ++enc=cp1252? If I want to vi from the command line a file containing MS word characters, it would be nice to be able to do it in one step, rather than opening vi and then loading the file with the :e command – Leo Simon – 2016-07-11T00:17:33.297

@LeoSimon: vim --cmd 'set fileencodings=cp1252' /path/to/file — The command runs before the normal .vimrc and sets the fileencodings option (note the ending s; you can also use the shorter name fencs) so that Vim will only try CP1252 when loading files. This should work for one-off editing of such files, but it may cause complications if you want to use that instance of Vim to edit files with other encodings. – Chris Johnsen – 2016-07-11T05:40:16.277

Thanks!, to be explicit, I'm now using vim -c"set fencs" /path/to/file – Leo Simon – 2016-07-11T19:17:54.067

You often get the curly quotes/apostrophe from content copied from MS Word which auto inserts the curly quotes/apostrophe as part of the "Smart Quotes" feature. If your font does not support those characters, you will just get an empty space instead of the character. – lambacck – 2010-12-21T15:31:04.043

1+1 for :e ++enc=cp1252 – wfaulk – 2012-11-30T02:27:37.343

27

91 and 92 are the hex codes for open and close curly apostrophe (single quote) in the MS Windows default version of the latin1/ISO-8859-1 encoding, which is more specifically called cp1252/Windows-1252 (where cp stands for code page).

These characters are most often inserted by people copying content from Word documents / Outlook emails as part of the "Smart Quotes" feature. Other problem characters in this code page are hex 93/94 which are open and close double quotes, bullet point (•) and OE ligature (œ and Œ). You can see a full list of the "problem characters", the ones that don't map directly into ISO-8859-1 or UTF-8 with the same code, on the Wikipeda page for cp1252 highlighted in green.

If all you want is to open the file in the correct encoding then use the ++enc=cp1252 option to the :e command:

:e ++enc=1252 filename.txt

You can replace a particular bad hex code in Vim with the substitute command (:s) and one of the code substitutions:

\d123   decimal number of character
\o40    octal number of character up to 0377
\x20    hexadecimal number of character up to 0xff
\u20AC  hex. number of multibyte character up to 0xffff
\U1234  hex. number of multibyte character up to 0xffffffff

To change the hex 91/92 characters in you need to do:

:%s/[\x91\x92]/'/g

lambacck

Posted 2010-10-15T13:33:05.217

Reputation: 381

2sed -i "s/\x92/'/g" worked for me. – Karoly Horvath – 2015-01-30T13:04:18.297

It would be great to have a bash command to replace those characters in all files in the directory. I came up with this from a quick google search, sed -i "s/[\x91\x92]/\'/g" *.txt but it didn't work. – Buttle Butkus – 2013-03-13T07:39:37.523

I just found something that seemed to work for the command line. This does find/replace for all .txt files in the current folder. Reasearch perl before using this, though, because I have no idea what the switches do. perl -p -i -e "s/[\x91\x92]/'/g" *.txt – Buttle Butkus – 2013-03-13T07:48:14.467

3

Use iconv to convert the text file from CP1252 to UTF-8 before opening.

iconv -f cp1252 -t utf8 inputfile.csv > outputfile.csv

On Mac OS use this:

iconv -f cp1252 -t UTF8-MAC inputfile.csv  > outputfile.csv

Ignacio Vazquez-Abrams

Posted 2010-10-15T13:33:05.217

Reputation: 100 516

-3

They actually stand for hex 91 and 92, which in the Windows codepage are curly opening and closing single quotes (‘ and ’ - Alt-0145 and Alt-0146).

Try the following search/replace:

:s%/\<9[12]\>/'/g

Alex

Posted 2010-10-15T13:33:05.217

Reputation: 1 848

1I can't downvote due to lack of points, but this substitution command is so wrong I don't know where to begin :( – lambacck – 2010-12-21T15:48:16.873

1

This doesn't work for me: http://stackoverflow.com/questions/2798398/how-to-search-and-replace-an-unprintable-character/2801132#2801132 gives a solution that does work.

– Confusion – 2011-03-02T07:32:13.320

@lambacck: I was assuming that the file contains the literal strings "91" and "92", and in that case this command is correct. If these are hex characters, then you're right, you'd need your substitution command or something similar. – Alex – 2011-03-02T14:43:10.370