Unicode subscripts and superscripts

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals.[1] These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

The difference between superscript/subscript and numerator/denominator glyphs. In many popular fonts the Unicode "superscript" and "subscript" characters are actually numerator and denominator glyphs.

The World Wide Web Consortium and the Unicode Consortium have made recommendations on the choice between using markup and using superscript and subscript characters:

"When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts.... However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic or phonemic transcription."[2]

Uses

The intended use[2] when these characters were added to Unicode was to allow chemical and algebra formulas and phonetics to be written without markup, but produce true superscripts and subscripts. Thus "H₂O" (using a subscript character) is supposed to be identical to "H2O" (with subscript markup).

In reality most fonts that include these characters ignore the Unicode definition, and design the digits for mathematical numerator and denominator glyphs, which are smaller than normal characters but are aligned with the cap line and the baseline, respectively. When used with the solidus, these glyphs are useful for making arbitrary diagonal fractions (similar to the ½ glyph). Trying to make fractions using existing software super/subscripts look messier (example: 1/2), so font designers provided this alternative. This also makes the superscript letters useful for ordinal indicators, more closely matching the ª and º characters. However it makes them incorrect for normal super and subscripts, and generally formulas look better using markup than these characters.

Unicode intended to produce diagonal fractions through a different mechanism but it is very poorly supported. The fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts) is intended to tell a layout system that a fraction such as ¾ should be rendered[3] using automatic glyph substitution[lower-alpha 1] for the digits. Some browsers support this[lower-alpha 2] but not in all fonts, a selection of fonts is shown in the below table.

Characters Font Result
U+00BD ½ VULGAR FRACTION ONE HALF Default ½
U+00B9 ¹ SUPERSCRIPT ONE, U+002F / SOLIDUS, U+2082 SUBSCRIPT TWO ¹/₂
U+00B9 ¹ SUPERSCRIPT ONE, U+2044 FRACTION SLASH, U+2082 SUBSCRIPT TWO ¹⁄₂
U+0031 1 DIGIT ONE, U+2044 FRACTION SLASH, U+0032 2 DIGIT TWO 1⁄2
Arial 1⁄2
Cambria 1⁄2
Consolas 1⁄2
Times New Roman 1⁄2

Superscripts and subscripts block

The most common superscript digits (1, 2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at U+2070 to U+209F. The two tables below show these characters. Each superscript or subscript character is preceded by a normal x to show the subscripting/superscripting. The table on the left contains the actual Unicode characters; the one on the right contains the equivalents using HTML markup for the subscript or superscript.

Unicode characters
0123456789ABCDEF
U+00Bx
U+207x x⁰xⁱ x⁴x⁵x⁶x⁷x⁸x⁹x⁺x⁻x⁼x⁽x⁾xⁿ
U+208x x₀x₁x₂x₃x₄x₅x₆x₇x₈x₉x₊x₋x₌x₍x₎
U+209x xₐxₑxₒxₓxₔ xₕxₖxₗxₘ xₙxₚxₛxₜ
Simulated using <sup> or <sub> tags
0123456789ABCDEF
U+00Bx x2x3 x1
U+207x x0xi x4x5x6x7 x8x9x+x x=x(x)xn
U+208x x0x1x2x3 x4x5x6x7 x8x9x+x x=x(x)
U+209x xaxexoxx xəxhxkxl xmxnxpxs xt
  Reserved for future use.
  Other characters from Latin-1 not related to super- or sub-scripts.

Other superscript and subscript characters

Unicode version 13.0 also includes subscript and superscript characters that are intended for semantic usage, in the following blocks:[1][4]

  • The Latin-1 Supplement block contains the feminine and masculine ordinal indicators ª and º.
  • The Latin Extended-C block contains one additional superscript, ⱽ, and one additional subscript ⱼ.
  • The Latin Extended-D block contains three superscripts: ꝰ ꟸ ꟹ.
  • The Latin Extended-E block contains five superscripts: ꭜ ꭝ ꭞ ꭟ ꭩ.
  • The Combining Diacritical Marks block contains medieval superscript letter diacritics. These letters are written directly above other letters appearing in medieval Germanic manuscripts, and so these glyphs do not include spacing, for example uͤ. They are shown here over the dotted circle placeholder ◌: ◌ͣ ◌ͤ ◌ͥ ◌ͦ ◌ͧ ◌ͨ ◌ͩ ◌ͪ ◌ͫ ◌ͬ ◌ͭ ◌ͮ ◌ͯ.
  • The Combining Diacritical Marks Extended block contains two combining letters for linguistic transcriptions of Scots. They are shown here over the dotted circle placeholder ◌: ◌ᪿ ◌ᫀ.
  • The Combining Diacritical Marks Supplement block contains additional medieval superscript letter diacritics, enough to complete the basic lowercase Latin alphabet except for j, q and y, a few small capitals and ligatures (ae, ao, av), and additional letters: ◌ᷓ ◌ᷔ ◌ᷕ ◌ᷖ ◌ᷗ ◌ᷘ ◌ᷙ ◌ᷚ ◌ᷛ ◌ᷜ ◌ᷝ ◌ᷞ ◌ᷟ ◌ᷠ ◌ᷡ ◌ᷢ ◌ᷣ ◌ᷤ ◌ᷥ ◌ᷦ ◌ᷧ ◌ᷨ ◌ᷩ ◌ᷪ ◌ᷫ ◌ᷬ ◌ᷭ ◌ᷮ ◌ᷯ ◌ᷰ ◌ᷱ ◌ᷲ ◌ᷳ ◌ᷴ. There is also a combining subscript: ◌᷊..
  • The Spacing Modifier Letters block has superscripted letters and symbols used for phonetic transcription: ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ˀ ˁ ˠ ˡ ˢ ˣ ˤ.
  • The Phonetic Extensions block has several sub- and super-scripted letters and symbols: Latin/IPA ᴬ ᴭ ᴮ ᴯ ᴰ ᴱ ᴲ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ ᴻ ᴼ ᴽ ᴾ ᴿ ᵀ ᵁ ᵂ ᵃ ᵄ ᵅ ᵆ ᵇ ᵈ ᵉ ᵊ ᵋ ᵌ ᵍ ᵏ ᵐ ᵑ ᵒ ᵓ ᵖ ᵗ ᵘ ᵚ ᵛ ᵢ ᵣ ᵤ ᵥ, Greek ᵝ ᵞ ᵟ ᵠ ᵡ ᵦ ᵧ ᵨ ᵩ ᵪ, Cyrillic ᵸ, other ᵎ ᵔ ᵕ ᵙ ᵜ. These are intended to indicate secondary articulation.
  • The Phonetic Extensions Supplement block has several more: Latin/IPA ᶛ ᶜ ᶝ ᶞ ᶟ ᶠ ᶡ ᶢ ᶣ ᶤ ᶥ ᶦ ᶧ ᶨ ᶩ ᶪ ᶫ ᶬ ᶭ ᶮ ᶯ ᶰ ᶱ ᶲ ᶳ ᶴ ᶵ ᶶ ᶷ ᶸ ᶹ ᶺ ᶻ ᶼ ᶽ ᶾ, Greek ᶿ.
  • The Cyrillic Extended-B block contains two Cyrillic superscripts: ꚜ ꚝ.
  • The Cyrillic Extended-A and -B blocks contains multiple medieval superscript letter diacritics, enough to complete the basic lowercase Cyrillic alphabet used in Church Slavonic texts, also includes an additional ligature (ст): ◌ⷠ ◌ⷡ ◌ⷢ ◌ⷣ ◌ⷤ ◌ⷥ ◌ⷦ ◌ⷧ ◌ⷨ ◌ⷩ ◌ⷪ ◌ⷫ ◌ⷬ ◌ⷭ ◌ⷮ ◌ⷯ ◌ⷰ ◌ⷱ ◌ⷲ ◌ⷳ ◌ⷴ ◌ⷵ ◌ⷶ ◌ⷷ ◌ⷸ ◌ⷹ ◌ⷺ ◌ⷻ ◌ⷼ ◌ⷽ ◌ⷾ ◌ⷿ ◌ꙴ ◌ꙵ ◌ꙶ ◌ꙷ ◌ꙸ ◌ꙹ ◌ꙺ ◌ꙻ ◌ꚞ ◌ꚟ.
  • The Georgian block contains one superscripted Mkhedruli letter: ჼ.
  • The Kanbun block has superscripted annotation characters used in Japanese copies of Classical Chinese texts: ㆒ ㆓ ㆔ ㆕ ㆖ ㆗ ㆘ ㆙ ㆚ ㆛ ㆜ ㆝ ㆞ ㆟.
  • The Tifinagh block has one superscript letter : ⵯ.
  • The Unified Canadian Aboriginal Syllabics and its Extended blocks contain several mostly consonant-only letters to indicate syllable coda called Finals, along with some characters that indicate syllable medial known as Medials: Main block ᐜ ᐝ ᐞ ᐟ ᐠ ᐡ ᐢ ᐣ ᐤ ᐥ ᐦ ᐨ ᐩ ᐪ ᑉ ᑊ ᑋ ᒃ ᒄ ᒡ ᒢ ᒻ ᒼ ᒽ ᒾ ᓐ ᓑ ᓒ ᓪ ᓫ ᔅ ᔆ ᔇ ᔈ ᔉ ᔊ ᔋ ᔥ ᔾ ᔿ ᕐ ᕑ ᕝ ᕪ ᕻ ᕽ ᖅ ᖕ ᖖ ᖟ ᖦ ᖮ ᗮ ᘁ ᙆ ᙇ ᙚ ᙾ ᙿ, Extended block ᣔ ᣕ ᣖ ᣗ ᣘ ᣙ ᣚ ᣛ ᣜ ᣝ ᣞ ᣟ ᣳ ᣴ ᣵ. Additionally, there are two Finals, a Medial, and two punctuations written as raised characters in the main block: ᐀ ᐧ ᕀ ᕁ ᕯ.

Latin and Greek tables

Consolidated, the Unicode standard contains superscript and subscript versions of a subset of Latin and Greek letters. Here they are arranged in order for comparison (or for copy and paste convenience). Since these characters come from different ranges, they may not be of the same size and position due to font substitution. Asterisks mark small capitals that are not distinct from minuscules and so would not be expected to be supported by Unicode.

Latin superscript and subscript letters
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Superscript capital ᴿ
Superscript small cap *[5]******
Superscript minuscule ʰʲˡʳˢʷˣʸ
Overscript small cap *******
Overscript minuscule ◌ͣ◌ͨ◌ͩ◌ͤ◌ͪ◌ͥ◌ͫ◌ͦ◌ͬ◌ͭ◌ͧ◌ͮ◌ͯ
Subscript minuscule
Underscript minuscule ◌᷊◌ᪿ
Greek superscript and subscript letters
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ
Superscript minuscule ⁽ᵋ⁾ᶿ⁽ᶥ⁾⁽ᶹ⁾
Overscript minuscule
Subscript minuscule
other IPA superscript letters
ɐɑɒæçɔɕðəɜɛɟɡɦɥɨɩʝɭɱɯɰŋɲɳɵœɹɻʁʂʃƫʉʊʋʌɣʍʐʑʒɸʔʕ
Superscript ⁽ᶜ̧⁾ʱʴʵʶˠˀˁ,ˤ
Overscript ̉
Underscript ◌ᫀ

See also small caps in Unicode, superscript IPA letters.

Composite characters

Primarily for compatibility with earlier character sets, Unicode contains a number of characters that compose super- and subscripts with other symbols.[1] In most fonts these render much better than attempts to construct these symbols from the above characters or by using markup.

  • The Unified Canadian Aboriginal Syllabics and its Extended blocks contain several letters composed with superscripted letters to indicate extended sound values: Main block ᐂ ᐫ ᐬ ᐭ ᐮ ᐰ ᑍ ᑧ ᑨ ᑩ ᑪ ᑬ ᒅ ᒆ ᒇ ᒈ ᒊ ᒤ ᓁ ᓔ ᓮ ᔌ ᔍ ᔎ ᔏ ᔧ ᕅ ᕔ ᕿ ᖀ ᖁ ᖂ ᖃ ᖄ ᖎ ᖏ ᖐ ᖑ ᖒ ᖓ ᖔ ᙯ ᙰ ᙱ ᙲ ᙳ ᙴ ᙵ ᙶ, Extended block ᢰ ᢱ ᢲ ᢳ ᢴ ᢵ ᢶ ᢷ ᢸ ᢹ ᢺ ᢻ ᢼ ᢽ ᢾ ᢿ ᣀ ᣁ ᣂ ᣃ ᣄ ᣅ.

Notes

  1. For a general overview and technical information on glyph substitution (though not specifically for fractions): GSUB — Glyph Substitution Table in the OpenType specification on the Microsoft Typography site.
  2. Such as Chrome on Windows, Firefox
gollark: `channel` and `message` and `metadata`.
gollark: No thanks.
gollark: Yep.
gollark: Er, it's at 10567.
gollark: -40 now.

References

  1. "UCD: UnicodeData.txt". The Unicode Standard. Retrieved 2016-05-14.
  2. Martin Dürst, Asmus Freytag (16 May 2007). "Unicode in XML and other Markup Languages". W3C. Retrieved 13 September 2010.
  3. Martin Dürst, Asmus Freytag (16 May 2007). "Fraction Slash". W3C. Retrieved 13 September 2010.
  4. "UCD: Scripts.txt". The Unicode Standard. Retrieved 2020-03-17.
  5. ᵸ was originally defined as a Cyrillic . However, since there is no graphic difference between that and superscript ʜ, it is being redefined to serve both purposes.
  6. Silva, Eduardo Marín (2017-03-01). "L2/17-066R: Proposal to encode the Marca Registrada sign" (PDF).
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.