CJK Unified Ideographs Extension B

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese.

CJK Unified Ideographs Extension B
RangeU+20000..U+2A6DF
(42,720 code points)
PlaneSIP
ScriptsHan
Assigned42,718 code points
Unused2 reserved code points
Unicode version history
3.142,711 (+42,711)
13.042,718 (+7)
Note: [1][2]

The block has dozens of variation sequences defined for standardized variants.[3]

It also has thousands of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD).[4][5] These sequences specify the desired glyph variant for a given Unicode character.

It is the only CJK Unified Ideographs Extension block with a UCS2003 source identifier. Since Extension B contained too many characters, the original code charts were produced with a single glyph for all regions. The glyphs were designed by Beijing Zhongyi Electronic Ltd.. After the introduction of multi-column code charts, the original glyphs were retained under the UCS2003 source identifier. The glyphs are packaged in the "SimSun-ExtB" font distributed with the Simplified Chinese versions of Windows, and do not adhere to the glyphs for the Mainland China region.

Known issues

Other 3 glyphs in Extension B

In CJK Unified Ideographs Extension B, some characters are incorrectly unified with others. These characters include U+2017B (𠅻), U+204AF (𠒯) and U+24CB2 (𤲲). The first two characters contained a wrong unification of Chinese Mainland and Vietnamese source of their glyph, while the last one unifies the Chinese Mainland and Taiwanese ones.[6]

Unifiable variants and exact duplicates in Extension B

Also in CJK Unified Ideographs Extension B, hundreds of glyph variants were encoded.[7] In addition to the deliberate encoding of close glyph variants, six exact duplicates (where the same character has inadvertently been encoded twice) and two semi-duplicates (where the CJK-B character represents a de facto disunification of two glyph forms unified in the corresponding BMP character) were encoded by mistake:[8]

  • U+34A8 㒨 = U+20457 𠑗 : U+20457 is the same as the China-source glyph for U+34A8, but it is significantly different from the Taiwan-source glyph for U+34A8
  • U+3DB7 㶷 = U+2420E 𤈎 : same glyph shapes
  • U+8641 虁 = U+27144 𧅄 : U+27144 is the same as the Korean-source glyph for U+8641, but it is significantly different from the Chinese Mainland-, Taiwan- and Japan-source glyphs for U+8641
  • U+204F2 𠓲 = U+23515 𣔕 : same glyph shapes, but ordered under different radicals
  • U+249BC 𤦼 = U+249E9 𤧩 : same glyph shapes
  • U+24BD2 𤯒 = U+2A415 𪐕 : same glyph shapes, but ordered under different radicals
  • U+26842 𦡂 = U+26866 𦡦 : same glyph shapes
  • U+FA23 﨣 = U+27EAF 𧺯 : same glyph shapes (U+FA23 﨣 is a unified CJK ideograph, despite its name "CJK COMPATIBILITY IDEOGRAPH-FA23.")

History

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Unified Ideographs Extension B block:

VersionFinal code points[lower-alpha 1]CountL2 IDWG2 IDIRG IDDocument
3.1U+20000..2A6D642,711L2/98-260Ng, Nelson; Kung, Michael (1998-05-26), "CJK UNIFIED IDEOGRAPHS EXTENSION B", Report on IRG meeting #11
L2/99-239Addition of three hundred and fourteen KANJIs (from JIS X0213), 1999-07-15
L2/99-310Addition of three hundred and thirteen KANJIs (from JIS X0213), 1999-08-23
L2/99-335N2109N674Zhang, Zhoucai (1999-09-03), SuperCJK, version 9.0 with Kangxi and HYD data
L2/99-336N2105N675CJK Unified Ideographs Extension B WD 6.0, 1999-09-03
L2/99-316Whistler, Ken (1999-09-13), Comments on JCS proposal
L2/99-312excerpt of usages and sources of proposed KANJIs in contemporary Japanese, 1999-10-06
L2/99-366Suignard, Michel (1999-11-24), Text for CD ballot of ISO/IEC 10646 part 2
L2/99-366.1Cover page for N3393, 1999-11-24
L2/99-366.2Suignard, Michel (1999-11-24), Text of CD 10646-2
L2/99-366.3Suignard, Michel (1999-11-24), CJK Ext. B pages 001-100
L2/99-366.4Suignard, Michel (1999-11-24), CJK Ext. B pages 101-200
L2/99-366.5Suignard, Michel (1999-11-24), CJK Ext. B pages 201-300
L2/99-366.6Suignard, Michel (1999-11-24), CJK Ext. B pages 301-335
L2/99-366.7Suignard, Michel (1999-11-24), Special Purpose Plane and Annexes
L2/99-366.8Suignard, Michel (1999-11-24), Mapping of CJK Ext. B characters
L2/99-385N2144N713RJenkins, John (1999-12-08), Clarification of the Non-Cognate Rule
L2/00-010N2103Umamaheswaran, V. S. (2000-01-05), "10.3", Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13--16
L2/00-021R (pdf, rtf)ISO CD 10646 Part-2 vote -- A proposal to move JIS X 0213 Kanji characters on Extension-B into BMP, 2000-01-21
L2/00-030Enomoto, Yoshi (2000-01-31), Background of the proposal (for encoding of 302 ideographs from JIS X 0213)
L2/00-036Umamaheswaran, V. S.; Sargent, Murray (2000-02-03), Expert contribution on the placement of additional unified ideographs from JIS X0213, HK, and Korea
L2/01-026 (pdf, doc)N2298N758CJK Unified Ideographs Extension B, PreDIS R1 For ISO/IEC DIS 10646-2:2000, 2000-11-21
L2/01-136N2334 (pdf, doc)Sato, T. K. (2001-03-28), Notification of an error and request for a correction regarding mapping information for a particular JIS X 0213 character in CJK UNIFIED IDEOGRAPHS EXTENSION-B
L2/01-163N2347N785CJK Unified Ideographs Extension B PreIS For ISO/IEC 10646-2:2000, 2001-03-30
L2/01-162N2349 (pdf, doc)N787Zhang, Zhoucai (2001-04-02), Clarification On Versions of CJK Unified Ideographs Extension B As Well As SuperCJK
L2/02-122N2427Ksar, Mike (2002-03-18), Proposal to add 1 Hanja code of D P R of Korea into 10646-2:2001
L2/02-201N2448N924Error Correction, 2002-05-08
L2/02-416N2518Proposal to add 2 hanja codes of D P R of Korea into 10646-2:2001, 2002-11-01
L2/03-017Late DPRK Comments on SC 2 N 3625, 10646-2: 2001/FPDAM 1, 2002-12-09
L2/03-287Cook, Richard (2003-08-24), 16 UniHan.txt errors
L2/03-301Cook, Richard (2003-08-27), 24 more UniHan.txt errors
L2/03-311West, Andrew (2003-09-17), Unicode 4.0.1 Beta Review, comments from Andrew C. West
L2/03-399Fok, Anthony (2003-10-13), Unihan reported errors / changes re kHKSCS entries
L2/03-398Nguyen, D. (2003-10-29), Unihan reported errors / changes re kCowles
L2/03-453Minutes of the Editorial Group Ad Hoc Discussion, 2003-12-17
L2/04-008N2695N1026China's confirmation on fonts for CJK_B 21E2D and 21E45, 2004-01-05
L2/04-208N2774RN1064Proposal to add 6 KP source references to existing CJK Unified Ideographs, 2004-05-25
L2/04-281N2830Suignard, Michel (2004-06-23), CJK Ideograph source visual references information
L2/04-417Cook, Richard (2004-11-18), Extension B font versioning: preliminary work
L2/05-022Cook, Richard (2005-01-25), Extension B font versioning: follow-up report, part 1 [text]
L2/05-023Cook, Richard (2005-01-25), Extension B font versioning: follow-up report, part 2 [tables]
N3353 (pdf, doc)Umamaheswaran, V. S. (2007-10-10), "M51.9", Unconfirmed minutes of WG 2 meeting 51 Hanzhou, China; 2007-04-24/27
L2/07-208N3285Proposal to replace 11 KP source references to existing ISO/IEC 10646:2003, 2007-07-18
L2/08-234N1406Cook, Richard; Bishop, Thomas; Lunde, Ken (2008-06-06), Han Unification Issues
L2/08-310Cook, Richard (2008-08-12), Fonts for Extension B and C and IRG
L2/10-215Lunde, Ken (2010-06-22), "Hanyo-Denshi" IVD Collection (PRI 167) to Adobe-Japan1-6 Mapping Table
N3903 (pdf, doc)"M57.07 (CJK Ext. B glyphs from 2nd edition)", Unconfirmed minutes of WG2 meeting 57, 2011-03-31
L2/11-243N4111Sources for Orphaned CJK Ideographs, 2011-06-14
L2/11-254Constable, Peter (2011-06-20), "Update to UTR #45 U-Source Ideographs requested", UTC Liaison Report from WG2
N4103"Resolution 58.05", Unconfirmed minutes of WG 2 meeting 58, 2012-01-03
L2/14-260N4621Suignard, Michel (2014-10-23), CJK chart and source references update
L2/16-052N4603 (pdf, doc)Umamaheswaran, V. S. (2015-09-01), "M63.05", Unconfirmed minutes of WG 2 meeting 63
L2/17-180N2202Chan, Eiso (2017-06-02), Request for consideration to add kIRG_GSource values to thirteen ideographs and change two G-source glyphs for the Table of General Standard Chinese Characters [Affects 20164]
L2/17-362Moore, Lisa (2018-02-02), "Consensus 153-C16", UTC #153 Minutes
N4974N2301Request of TCA’s Horizontal Extension for Chemical Terminology [Affects U+20BBF, U+20C02, U+20CED, U+26B4C, U+26CBE, U+26E3D, U+28834, U+289A1, U+289C0, U+28A0F, and U+28B46], 2018-06-12
N4987Proposal on China’s Horizontal Extension for 14 CJK Ideographs [Affects U+37C3, 3FE0, 9FD4, 20164, 24A7D, 25ED7, 2677C, 26C21, 2A917, 2AA30, 2BD77, 2C494, 2C72F, and 2CB38], 2018-06-13
N4988Proposal on Updating 11 G glyphs of CJK Unified Ideographs to ISO/IEC 10646 [Affects U+3B9D, 3CFD, 4A76, 6FF9, 809E, 891D, 21D4C, 2278B, 23AB8, 2459B, and 2A8FB], 2018-06-13
N2336Modify the G glyph for U+23517, 2018-09-10
N5016N2349Shin, Sanghyun; Cho, Sungduk; Pyo, Seungju; Kim, Kyongsok (2018-12-13), Request to move character K6-1022 in Horizontal Extension of KS X 1027-5 from U+3EAC to U+248F2
N5020 (pdf, doc)Umamaheswaran, V. S. (2019-01-11), "10.4.6, 10.4.8, and 10.4.9", Unconfirmed minutes of WG 2 meeting 67
N2369Chan, Eiso (2019-05-06), Feedback on IRGN2369 [Affects U+20219 U+21249, U+21827, U+22C3A, U+2327B, U+2363B, U+23839, U+23FD5, U+24261, U+2548E, and U+26C9E]
N5086N2379Proposal of China’s horizontal extension for technical used characters [Affects U+23496, U+2355E, U+236ED, U+24726, U+26FE1, U+27334, and U+2A38C], 2019-05-10
L2/19-237N5068Editorial Report on Miscellaneous Issues (meeting IRG#52) [Affects U+23517, U+248F2, and U+26657], 2019-05-17
L2/19-244N5107TCA's UNC Proposal for WG2 submission [Affects U+27C0E], 2019-05-24
L2/19-241N5083N2391Errata report for WG2 submission_TCA [Affects U+26657], 2019-05-31
N5082N2391Updated G Font of U+23517, 2019-05-31
13.0U+2A6D7..2A6DD7L2/17-087Chan, Eiso; Wang, Xiaolei; Le, Hou; You, Jerry (2017-04-03), Proposal to encode characters for Gongche Notation
L2/17-103Moore, Lisa (2017-05-18), "E.5", UTC #151 Minutes
N2299Chan, Eiso (2018-04-22), Request to discuss how to handle seven unencoded Gongche characters for Kunqu Opera
L2/18-245N4967Chan, Eiso; You, Jerry; Wang, Xiaolei; Le, Hou (2018-06-01), Updated proposal on Gongche characters for Kunqu Opera
L2/18-241Anderson, Deborah; et al. (2018-07-25), "17", Recommendations to UTC # 156 July 2018 on Script Proposals
L2/18-183Moore, Lisa (2018-11-20), "B.4.1", UTC #156 Minutes
N5020 (pdf, doc)Umamaheswaran, V. S. (2019-01-11), "10.2.3", Unconfirmed minutes of WG 2 meeting 67
N5122"M68.01", Unconfirmed minutes of WG 2 meeting 68, 2019-12-31
L2/19-243N5106Suignard, Michel (2019-06-20), "Gongche", Disposition of comments on ISO/IEC CD.2 10646 6th edition
L2/19-270Moore, Lisa (2019-08-02), "Consensus 160-C9", UTC #160 Minutes
  1. Proposed code points and characters names may differ from final code points and names
gollark: Well, really, natural-language pronouns are just a horrible hack for variables.
gollark: ...
gollark: ++remind 1nanogalacticyear bee you.
gollark: Well, a superset of the inputs.
gollark: ABR ++remind uses the same time parsing algorithm, actually.

See also

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
  3. "Unicode Character Database: Standardized Variation Sequences". The Unicode Consortium.
  4. "Ideographic Variation Database". Unicode Consortium.
  5. "UTS #37, Unicode Ideographic Variation Database". Unicode Consortium.
  6. Eiso Chan (陈永聪), Comments on four error glyphs on CJK Unified Ideographs Ext B & E.
  7. "unifiable glyph variants" (PDF). Archived from the original (PDF) on 2006-05-15. Retrieved 2017-12-01.
  8. Cook, Richard (6 October 2003). "Defect Report on Duplicate Encoded CJK Forms" (PDF). ISO/IEC JTC1/SC2/WG2. Retrieved 2012-03-28.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.