Sort of numbers

21

1

Within the recesses of Unicode characters, there exists a Unicode block of (currently) 63 characters named "Number Forms", which consists of characters that have numerical values such as the roman numeral Ⅻ, vulgar fractions like ⅑ or ↉, or weird ones like ↊ (10) or ↈ (100000).

Your task is to write a program or function that, when given a list of assigned Unicode characters within this block, sorts the list by the numerical values of each character.

A (sortable) list of characters and values can be found on the Wikipedia Page.

To be self contained though, here's a list of the codepoints and their values:

Hex     Char   Value
0x00BC: ¼   = 1/4 or 0.25
0x00BD: ½   = 1/2 or 0.5
0x00BE: ¾   = 3/4 or 0.75
0x2150: ⅐   = 1/7 or 0.142857
0x2151: ⅑   = 1/9 or 0.111111
0x2152: ⅒   = 1/10 or 0.1
0x2153: ⅓   = 1/3 or 0.333333
0x2154: ⅔   = 2/3 or 0.666667
0x2155: ⅕   = 1/5 or 0.2
0x2156: ⅖   = 2/5 or 0.4
0x2157: ⅗   = 3/5 or 0.6
0x2158: ⅘   = 4/5 or 0.8
0x2159: ⅙   = 1/6 or 0.166667
0x215A: ⅚   = 5/6 or 0.833333
0x215B: ⅛   = 1/8 or 0.125
0x215C: ⅜   = 3/8 or 0.375
0x215D: ⅝   = 5/8 or 0.625
0x215E: ⅞   = 7/8 or 0.875
0x215F: ⅟   = 1
0x2160: Ⅰ   = 1
0x2161: Ⅱ   = 2
0x2162: Ⅲ   = 3
0x2163: Ⅳ   = 4
0x2164: Ⅴ   = 5
0x2165: Ⅵ   = 6
0x2166: Ⅶ   = 7
0x2167: Ⅷ   = 8
0x2168: Ⅸ   = 9
0x2169: Ⅹ   = 10
0x216A: Ⅺ   = 11
0x216B: Ⅻ   = 12
0x216C: Ⅼ   = 50
0x216D: Ⅽ   = 100
0x216E: Ⅾ   = 500
0x216F: Ⅿ   = 1000
0x2170: ⅰ   = 1
0x2171: ⅱ   = 2
0x2172: ⅲ   = 3
0x2173: ⅳ   = 4
0x2174: ⅴ   = 5
0x2175: ⅵ   = 6
0x2176: ⅶ   = 7
0x2177: ⅷ   = 8
0x2178: ⅸ   = 9
0x2179: ⅹ   = 10
0x217A: ⅺ   = 11
0x217B: ⅻ   = 12
0x217C: ⅼ   = 50
0x217D: ⅽ   = 100
0x217E: ⅾ   = 500
0x217F: ⅿ   = 1000
0x2180: ↀ   = 1000
0x2181: ↁ   = 5000
0x2182: ↂ   = 10000
0x2183: Ↄ   = 100
0x2184: ↄ   = 100
0x2185: ↅ   = 6
0x2186: ↆ   = 50
0x2187: ↇ   = 50000
0x2188: ↈ   = 100000
0x2189: ↉   = 0
0x218A: ↊   = 10
0x218B: ↋   = 11

Test cases:

['½','ↆ','ↂ','⅒','Ⅽ','⅑','ⅷ'] -> ['⅒','⅑','½','ⅷ','ↆ','Ⅽ','ↂ']

['¼','↋','↉','ↅ','⅐','⅟','Ⅻ','ⅺ'] -> ['↉','⅐','¼','⅟','ↅ','↋','ⅺ','Ⅻ']

['¼','½','¾','⅐','⅑','⅒','⅓','⅔','⅕','⅖','⅗','⅘','⅙','⅚','⅛','⅜','⅝','⅞','⅟'] -> ['⅒','⅑','⅛','⅐','⅙','⅕','¼','⅓','⅜','⅖','½','⅗','⅝','⅔','¾','⅘','⅚','⅞','⅟']

'⅞ⅾ↊ↄⅨⅮⅺↁⅸⅰⅩⅱⅶ¾ⅧↅↃ↋ↆ⅔ⅼⅲ⅘⅒ⅽⅦ⅕ⅤⅭⅳↂⅪⅬⅯↇⅠⅷ⅛Ⅵ½ⅵ¼ⅻ⅐Ⅱ⅜⅗⅝⅚Ⅳ⅓ⅴ↉ⅿⅫⅹↀↈ⅙⅑Ⅲ⅖⅟' -> '↉⅒⅑⅛⅐⅙⅕¼⅓⅜⅖½⅗⅝⅔¾⅘⅚⅞⅟ⅠⅰⅡⅱⅢⅲⅣⅳⅤⅴⅥⅵↅⅦⅶⅧⅷⅨⅸⅩⅹ↊Ⅺⅺ↋ⅫⅻⅬⅼↆⅭⅽↄↃⅮⅾⅯⅿↀↁↂↇↈ'

['Ↄ','ↄ','↊','↋'] -> ['↊','↋','ↄ','Ↄ']

Note that four of the symbols (the ones used in the last case) are not unicode numbers, though they still have a numerical value, so make sure to check before just posting a built-in.

Rules:

  • If any more characters are assigned to this block in the future, you won't need to update your code to support them.
  • Order of characters with identical values doesn't matter.
  • IO is flexible.
    • Output must be as the characters though, not the numerical values
  • Standard Loopholes are forbidden.
  • I'm not banning built-ins that can fetch the numerical value of a character, but I encourage also adding a non-builtin answer if possible.
  • This is , so shortest answer in bytes for each languages wins! Good luck!

Jo King

Posted 2018-07-24T07:31:24.240

Reputation: 38 234

9R.I.P monospacing :( – Jo King – 2018-07-24T07:56:47.773

Answers

6

Python 3, 216 213 bytes

-3 bytes thanks to TFeld

lambda l:sorted(l,key='⅒⅑⅐⅙⅕¼⅓⅖½⅗⅔¾⅘⅚⅟ⅠⅰⅡⅱⅢⅲⅣⅳⅤⅴⅥⅵↅⅦⅶⅧⅷⅨⅸⅩⅹ↊Ⅺⅺ↋ⅫⅻⅬⅼↆⅭⅽↃↄ⅛⅜Ⅾⅾ⅝⅞Ⅿⅿↀↁↂↇↈ'.find)

Try it online!

With built-in that fetch the numerical value, 111 bytes

lambda l:sorted(l,key=lambda c:[10,11,100,100,0]['↊↋Ↄↄ'.find(c)]or numeric(c))
from unicodedata import*

Try it online!

Rod

Posted 2018-07-24T07:31:24.240

Reputation: 17 588

4You can save 3 bytes by removing from the string (find returns -1 which is smallest) – TFeld – 2018-07-24T11:48:16.317

4

05AB1E (legacy), 192 74 63 61 bytes

Σ•Bšā¿ÑáζΔÕæ₅"®GÙ₂®°ƶío"§óÏ4¸bćÔ!₃ùZFúÐìŸ
,λ₂ϦP(Ì•65в₂+sÇт%k

-118 bytes by using characters of 05AB1E's code page only, so we don't need to use UTF-8 encoding.
-11 bytes thanks to @Adnan.
-2 bytes thanks to @Grimy.

Try it online or verify all test cases.

Explanation:

Σ            # Sort the input by:
 •Bšā¿ÑáζΔÕæ₅"®GÙ₂®°ƶío"§óÏ4¸bćÔ!₃ùZFúÐìŸ
 ,λ₂ϦP(Ì•65в₂+
             #  List of ASCII values modulo-100 of the characters we want to sort
 sÇ          #  Get the ASCII value of the current input-character
   т%        #  Take modulo 100 of this ASCII value
 k           #  And get the index in the list of ASCII values, as sorting order

So what is •Bšā¿ÑáζΔÕæ₅"®GÙ₂®°ƶío"§óÏ4¸bćÔ!₃ùZFúÐìŸ\n,λ₂ϦP(Ì•65в₂+?

Based on the order of the characters modulo-100 we get the following list:

[85,30,29,39,28,37,33,88,31,40,34,89,35,41,32,90,36,38,42,43,44,60,45,61,46,62,47,63,48,64,49,65,81,50,66,51,67,52,68,53,69,86,54,70,87,55,71,56,72,82,57,73,79,80,58,74,59,75,76,77,78,83,84]

These are generated by the following program:

"↉⅒⅑⅛⅐⅙⅕¼⅓⅜⅖½⅗⅝⅔¾⅘⅚⅞⅟ⅠⅰⅡⅱⅢⅲⅣⅳⅤⅴⅥⅵↅⅦⅶⅧⅷⅨⅸⅩⅹ↊Ⅺⅺ↋ⅫⅻⅬⅼↆⅭⅽↃↄⅮⅾⅯⅿↀↁↂↇↈ"Çт%

Try it online.

•Bšā¿ÑáζΔÕæ₅"®GÙ₂®°ƶío"§óÏ4¸bćÔ!₃ùZFúÐìŸ\n,λ₂ϦP(Ì•65в₂+ is a shorter variation of this list by taking the compressed number 1485725021600091112740267145165274006958935956446028449609419704394607952161907963838640094709317691369972842282463, then converting it to Base-65, and then adding 26 to each.

Try it online and verify that the lists are the same.

Kevin Cruijssen

Posted 2018-07-24T07:31:24.240

Reputation: 67 575

How does the encoding work? Surely not all of those characters are single bytes in 05AB1E's encoding – Jo King – 2018-07-24T08:32:24.280

1Yes, not all of those characters are in 05AB1E's encoding, so that would be 192 bytes. – Okx – 2018-07-24T08:36:04.963

@Okx Hmm, TIO mentioned 68 chars, 68 bytes (SBCS) though, so that's what I looked at for the byte-count. How do you know it's 192? – Kevin Cruijssen – 2018-07-24T09:00:10.343

I think SBCS implies that TIO just assumes that every character is a single byte, instead of checking it against the languages code page – Jo King – 2018-07-24T09:06:06.547

2

Yeah, it is not possible to represent this code as a 68-byte file which forces us to fall back to UTF-8, which is indeed 192 bytes.

– Adnan – 2018-07-24T09:11:22.343

1@JoKing So, now I'm only using characters from 05AB1E's code page. ;) Still a boring approach, but will see if I can find some kind of arithmetic pattern. – Kevin Cruijssen – 2018-07-24T09:40:08.433

1I think you can replace "]&%/$-)\'0*a+1(b,.234D5E6F7G8H9IY:J;K<L=M^>N_?O@PZAQWXBRCSTUV["Ç8-with•4Œ”dóŒfÝŸĀTUÕáOyÖOÀÁàu¼6¹₆Žr‡_›y³eß₂©ǝ²ƶ"SAÎAñ'¡û†Ø(•91в` – Adnan – 2018-07-24T10:34:46.673

@Adnan Thanks! Btw, do you know why the verify all test cases doesn't work anymore?

– Kevin Cruijssen – 2018-07-24T11:08:04.390

1Hmm, that seems to be a parsing error not registring the closing bracket. I'll look into this. – Adnan – 2018-07-24T11:23:48.427

-2 bytes by using a better encoding. This is a fairly general trick (another example), so you may want to add it to your tips post. – Grimmy – 2019-05-16T13:12:03.463

4

Perl 6, 57 bytes

*.sort: {%(<Ↄ 100 ↄ 100 ↊ 10 ↋ 11>){$_}//.unival}

Try it online!

Just looks up the four exceptional characters in a hash, or falls back to the built-in unival method.

Sean

Posted 2018-07-24T07:31:24.240

Reputation: 4 136

You don't need the space after the colon. Also, your link is still in a code block rather than a Whatever lambda – Jo King – 2019-05-17T01:50:12.717

3

Japt, 72 bytes

ñ@`'%!x("y#) z$&*+,<-=.>/?0@1aq2b3c4d5ev6fw7g8hr9iop:j;klmn¡`u bXcuL

Try it or run all test cases


Explanation

ñ@                 :Sort by passing each X through a function
  `...`            :  A compressed string, which itself contains a bunch of unprintables (See below for codepoints of original string)
       u           :  Uppercase
         b         :  Index of
          Xc       :   Charcode of X
            uL     :   Mod 100 and get character at that codepoint

Codepoints

30,29,39,28,37,33,120,31,40,34,121,35,41,32,122,36,38,42,43,44,60,45,61,46,62,47,63,48,64,49,97,113,50,98,51,99,52,100,53,101,118,54,102,119,55,103,56,104,114,57,105,111,112,58,106,59,107,108,109,110,115,116

Original Solution, 90 89 88 bytes

ñ!b`(&" )#$*!%'+,-=.>/?0@1a2br3c4d5e6fw7gx8h9:jpq;k<lmÍ/`®iv u nLõd)dÃi6'¼ iA'½ iE'¾

Try it or run all test cases


Explanation

   `...`                                :A compressed string, which itself contains a bunch of unprintables (See below for codepoints of original string)
        ®                               :Map
         iv                             :  Prepend "v"
            u                           :  Convert to uppercase
               Lõ                       :  Range [1,100]
                 d                      :  Characters at those codepoints
              n   )                     :  Convert from that base to base-10
                   d                    :  Get the character at that codepoint
                    Ã                   :End map
                     i6'¼               :Insert "¼" at (0-based) index 6
                          iA'½          :Insert "½" at index 10
                               iE'¾     :Insert "¾" at index 14
ñ                                       :Sort the input array
 !b                                     :  By finding the index of the current element in the string above

Codepoints

31,30,40,29,38,34,32,41,35,36,42,33,37,39,43,44,45,61,46,62,47,63,48,64,49,97,50,98,114,51,99,52,100,53,101,54,102,119,55,103,120,56,104,57,105,115,58,106,112,113,59,107,60,108,109,110,111,116,117

Shaggy

Posted 2018-07-24T07:31:24.240

Reputation: 24 623

3

Retina, 193 bytes (UTF-8)

2{O`.
T`¼-¾⅐-↋\LI ^]Q@TU\\[ZYWSPNK\HFDB?=;975X\VR\OMJG\ECA><:86432\-)#1%0,*&.(!"$/+'`Ro

Try it online! Explanation: Sorts the characters in code point order, then maps between the numeric characters and ASCII characters so that the numeric characters with the lowest value map the the ASCII characters with the lowest code point and vice versa. Then repeats the exercise, so that the characters are now sorted in order of this ASCII mapping, which corresponds to the desired numeric order, before they are transformed back. Edit: Saved 100 (!) bytes by specifying the order of the ASCII characters rather than the numeric characters.

Neil

Posted 2018-07-24T07:31:24.240

Reputation: 95 035

3

Jelly, 55 bytes

O%70‘“$Żz*ṀḢD⁹VṢaʠƝ lẹkƝʋ9⁽ƭXmż4#⁺3ç%|ọṢLxƈ⁽}ÞƇ2’Œ?¤iµÞ

A monadic link accepting a list of characters which yields a list of characters.

Try it online!

How?

Much simpler than it looks since “$Żz*ṀḢD⁹VṢaʠƝ lẹkƝʋ9⁽ƭXmż4#⁺3ç%|ọṢLxƈ⁽}ÞƇ2’ is just a large number in base 250 using Jelly's code-page as the digits, I shall use “...’ in its place.

O%70‘“...’Œ?¤iµÞ - Link: list of characters
               Þ - sort by:
              µ  -   the monadic function (i.e. f(character)):
O                -     get the ordinal value of the character
 %70             -     modulo by 70 (get the remainder after dividing by 70)
                 -       - giving integers in [0,69] excluding [52,58]
    ‘            -     increment (below code pattern can't have anything but '↉' map to 0)
            ¤    -     nilad followed by link(s) as a nilad:
     “...’       -       literal 7826363328008670802853323905140295872014816612737076282224746687856347808481112431487214423845098801
          Œ?     -       get the permutation of natural numbers [1,N] with minimal N such
                 -         that this permutation would reside at the given index in a
                 -         sorted list of all permutations of those same numbers
                 -         -> [46,52,53,54,55,56,57,58,61,60,70,59,68,64,49,62,1,65,50,66,2,63,51,67,69,3,4,5,21,6,22,7,23,8,24,9,25,10,26,42,11,27,12,28,13,29,14,30,47,15,31,48,16,32,17,33,43,18,34,40,41,19,35,20,36,37,38,39,44,45]
             i   -     first index of (the ordinal mod 70 plus 1) in that list

Aside

Somewhat ironically the nearest to a "use a built-in approach" I could muster was 85 bytes, this uses a compressed string:

from unicodedata import*; copy_to( atoms['
'], numeric( atoms['
'].call()))

which is split on newlines and joined with s to give the Python code:

from unicodedata import*; copy_to( atoms['⁸'], numeric( atoms['⁸'].call()))

which is executable within Jelly's interpreter - it will place the Unicode character's numeric value into the left argument nilad, for later use.

Jonathan Allan

Posted 2018-07-24T07:31:24.240

Reputation: 67 804

3

05AB1E, 56 53 51 50 49 48 bytes

ΣÇ©1ö•Ω‘~Èr–Õî5®Î¼ÓÂ∍_OûR•42в•мjāl†£•₂°*S>ÅΓ®Íè+

Try it online!

At the core of this solution is a compressed list mapping unicode code points to a sorting key. Characters that correspond to the same number are mapped to the same key, so we only need 40 different keys.

70 is the smallest number by which we can modulo all input codepoints and get distinct results. Since indexing in 05AB1E wraps around, we don’t need to explicitly 70%, just make sure the list is length 70.

Notice that there are long stretches of consecutive code points with consecutive keys. Thus, encoding (key - codepoint) rather than simply (key) gives long stretches of identical numbers, which can be run-length encoded. However, the range of code points is very large (damn those 0xBC .. 0xBE), which would be an issue. So instead of (key - codepoint), we encode (key - sum_of_digits(codepoint)), which unfortunately limits the stretch length to 10, but does quite well at reducing the range of encoded values. (Other functions are of course possible, like codepoint % constant, but sum of digits gives the best results).

Additionally, it turns out rotating the list by 2 plays well with run-length encoding, so we subtract 2 from the codepoint before indexing.

•Ω‘~Èr–Õî5®Î¼ÓÂ∍_OûR•42в    # compressed list [25, 34, 27, 36, 30, 38, 29, 35, 41, 0, 28, 16, 19, 31, 7, 4, 11, 17, 22, 13, 16, 17, 20, 8, 19, 4, 18, 21]
•мjāl†£•                    # compressed integer 79980000101007
        ₂°*                 # times 10**26
           S                # split to a list of digits
            >               # add 1 to each
             ÅΓ             # run-length decode, using the first list as elements and the second list as lengths

Σ                           # sort by
 Ç©1ö                       # sum of digits of the codepoint
           +                # plus
     ...  è                 # the element of the run-length decoded list
        ®Í                  # with index (codepoint - 2) % 70

Grimmy

Posted 2018-07-24T07:31:24.240

Reputation: 12 521

2

JavaScript (SpiderMonkey), 117 bytes

a=>a.sort((a,b)=>(g=c=>';<=>GHIJCDEF?@AB/37~15:;8-9+6.24,*)0N(DEH@GMJKLH'[(c.charCodeAt()&127^44)%92%76%60])(a)>g(b))

Try it online!

Arnauld

Posted 2018-07-24T07:31:24.240

Reputation: 111 334

1

T-SQL, 207 bytes

SELECT*FROM t ORDER BY
CHARINDEX(c,N'⅒⅑⅛⅐⅙⅕¼⅓⅜⅖½⅗⅝⅔¾⅘⅚⅞⅟ⅠⅰⅡⅱⅢⅲⅣⅳⅤⅴⅥ
              ⅵↅⅦⅶⅧⅷⅨⅸⅩⅹ↊Ⅺⅺ↋ⅫⅻⅬⅼↆⅭⅽↃↄⅮⅾⅯⅿↀↁↂↇↈ'COLLATE Thai_BIN)

Return in the middle of the string is for readability only. I think I got the byte count correct (3 of the numeric characters are 1-byte, the remainder are 2-bytes), character count is 148.

I pre-sorted the string in ascending order, leaving out (which returns 0) as suggested by other answers.

Any binary collation will work, I used Thai_BIN since it has the shortest name. (A collation in SQL prescribes how character sorting/comparison is done, I need binary so each character only matches itself.)

Per our IO standards, input is taken via pre-existing table t with NCHAR(1) field c.

If you define the input table itself using a binary collation, you can leave that out to save 16 bytes:

CREATE TABLE t(c NCHAR(1) COLLATE Thai_BIN)

BradC

Posted 2018-07-24T07:31:24.240

Reputation: 6 099

Which characters would match each other if you didn't use binary collation? – Neil – 2018-07-25T18:54:53.203

1@Neil Well, depends on which other collation you use, actually! :). The most obvious one I noticed (using my server default of SQL_Latin1_General_SP1_CI_AS) was that the upper and lowercase Roman numerals match each other. Which.... hmm... might actually work for me here, since they resolve to the same number. But if the collation name is so much longer, that counteracts the savings. BRB, gotta test some more... – BradC – 2018-07-25T19:02:04.860

1@Neil Nope, no good. With non-binary collations, 10 of the less common characters (⅐⅑⅒Ↄↄↅↆↇↈ↉↊↋ if you are curious) all match up to each other. – BradC – 2018-07-25T19:17:47.787

Ah, that's a shame, but thanks for letting me know! – Neil – 2018-07-25T19:18:21.220

1

Ruby, 77 bytes

Changes all the characters to letters that represent the numerical values and sorts by that.

->a{a.sort_by{|e|e.tr'¼-¾⅐-↋','HLPECBIOGKMQFRDJNSaa-pa-ppqrnnfmstAjk'}}

Try it online!

Value Ink

Posted 2018-07-24T07:31:24.240

Reputation: 10 608

1

Perl 6, 13 52 bytes

*.sort:{%(<Ↄ 99 ↄ 99 ↊ 10 ↋ 11>){$_}//.EVAL}

Try it online!

bb94

Posted 2018-07-24T07:31:24.240

Reputation: 1 831

2Using eval isn't cheating, but this simply doesn't solve the challenge. 52 that actually works: *.sort:{%(<Ↄ 99 ↄ 99 ↊ 10 ↋ 11>){$_}//.EVAL} – Grimmy – 2019-05-17T11:08:06.083