ISO 2033

The ISO 2033:1983 standard ("Coding of machine readable characters (MICR and OCR)")[1] defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 ("Coding of machine readable characters (OCR and MICR)", originally designated JIS C 6229-1984) is closely related.[2]

Character set for OCR-A

The version of the encoding for the OCR-A font registered with the ISO-IR registry as ISO-IR-91 is the Japanese (JIS X 9010 / JIS C 6229) version, which differs from the encoding defined by ISO 2033 only in the addition of a Yen sign (shown shaded below).[2]

ISO 2033 and JIS C 6229 OCR-A set
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
0_
0
NUL
0000
SOH
0001
STX
0002
ETX
0003
EOT
0004
ENQ
0005
ACK
0006
BEL
0007
BS
0008
HT
0009
LF
000A
VT
000B
FF
000C
CR
000D
SO
000E
SI
000F
1_
16
DLE
0010
DC1
0011
DC2
0012
DC3
0013
DC4
0014
NAK
0015
SYN
0016
ETB
0017
CAN
0018
EM
0019
SUB
001A
ESC
001B
FS
001C
GS
001D
RS
001E
US
001F
2_
32
SP
0020
"
0022
£
00A3
$
0024
%
0025
&
0026
'
0027
{
007B
}
007D
*
002A
+
002B
,
002C
-
002D
.
002E
/
002F
3_
48
0
0030
1
0031
2
0032
3
0033
4
0034
5
0035
6
0036
7
0037
8
0038
9
0039
:
003A
;
003B

2440
=
003D

2441
?
003F
4_
64
A
0041
B
0042
C
0043
D
0044
E
0045
F
0046
G
0047
H
0048
I
0049
J
004A
K
004B
L
004C
M
004D
N
004E
O
004F
5_
80
P
0050
Q
0051
R
0052
S
0053
T
0054
U
0055
V
0056
W
0057
X
0058
Y
0059
Z
005A
¥
00A5

2442
6_
96
7_
112
|
007C
DEL
007F

  Letter  Number  Punctuation  Symbol  Other  Undefined  Redefined compared to JIS-Roman

Character set for OCR-B

The version of the G0 set for the OCR-B font registered with the ISO-IR registry as ISO-IR-92 is the Japanese (JIS X 9010 / JIS C 6229) version, which differs from the encoding defined by ISO 2033 only in being based on JIS-Roman (with a dollar sign at 0x24 and a Yen sign at 0x5C) rather than on the ISO 646 IRV (with a backslash at 0x5C and, at the time, a universal currency sign (¤) at 0x24).[3] Besides those code points, it differs from ASCII only in omitting the at sign (@) and tilde (~).[3] An additional supplementary set registered as ISO-IR-93 assigns the pound sign (£), universal currency sign (¤) and section sign (§) to their ISO-8859-1 codepoints, and the backslash to the ISO-8859-1 codepoint for the Yen sign.[4]

Character set for JIS X 9008 (JIS C 6257)

JIS X 9010 (JIS C 6229) also defines character sets for the JIS X 9008:1981 (formerly JIS C 6257-1981) "hand-printed" OCR font.[5]:fn1 These include subsets of the JIS X 0201 Roman set (registered as ISO-IR-94 and omitting the at sign (@), lowercase letters, curly braces ({, }) and overline (‾)),[5] and kana set (registered as ISO-IR-96 and omitting the East Asian style comma (、) and full stop (。), the interpunct (・) and the small kana),[6] in addition to a set (registered as ISO-IR-95) containing only the backslash, which is assigned to the same code point as in ISO-IR-93.[7]

The JIS C 6527 font stylises the slash[5] and backslash[7] characters with a doubled appearance. The character names given are "Solidus"[5] and "Reverse Solidus",[7] matching the Unicode character names for the ASCII slash and backslash.[8] However, the Unicode Optical Character Recognition block includes an additional code point for an "OCR Double Backslash" (⑊), although not for a double (forward) slash.[9]

Character set for E-13B

The MICR E-13B font, showing the ISO-IR-98 character repertoire.

The ISO-IR-98 encoding defined by ISO 2033 encodes the character repertoire of the E13B font, as used with magnetic ink character recognition.[10] Although ISO 2033 also specifies other encodings, the encoding for E-13B is the encoding referred to as ISO_2033_1983 by Perl libintl,[11] and as ISO_2033-1983 or csISO2033 by the IANA.[12] Other registered labels include iso-ir-98, its ISO-IR registration number, and simply e13b.[12]

The digits are preserved in their ASCII locations. Letters and symbols unavailable in the E13B font are omitted, while specialised punctuation for bank cheques included in the E13B font is added. The same symbols are available in Unicode in the Optical Character Recognition block.

ISO 2033:1983 E-13B set[11]
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
0_ NUL
0000
SOH
0001
STX
0002
ETX
0003
EOT
0004
ENQ
0005
ACK
0006
BEL
0007
BS
0008
HT
0009
LF
000A
VT
000B
FF
000C
CR
000D
SO
000E
SI
000F
1_ DLE
0010
DC1
0011
DC2
0012
DC3
0013
DC4
0014
NAK
0015
SYN
0016
ETB
0017
CAN
0018
EM
0019
SUB
001A
ESC
001B
FS
001C
GS
001D
RS
001E
US
001F
2_ SP
0020

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
3_ 0
0030
1
0031
2
0032
3
0033
4
0034
5
0035
6
0036
7
0037
8
0038
9
0039

2446

2447

2448

2449

 

 
4_
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
5_
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
6_
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
7_
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
DEL
007F

  Letter  Number  Punctuation  Symbol  Other  Undefined  Redefined compared to ASCII

gollark: Doesn't make it *good*
gollark: And bad for most uses!
gollark: Probably the most CPU-matching language is whatever microcode is written in.
gollark: I have no idea. The modern CPUs are probably significantly designed to fit C, at some level...
gollark: Still, the average compiler/interpreter is probably *not* as stupidly complex as CPUs.

References

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.