Crazy Librarian's Amazing Sorting System

21

3

It's back to school season! So for a part-time job, you're helping out at the school's library. The problem is, the head librarian has never even heard the words "Dewey Decimal," let alone implemented that system. Instead, the sorting system in use has grown "organically" as the library has expanded...

In an effort to keep your sanity, you've elected to write a program to help you sort books as they're returned, because woe unto you if you sort the books wrong. (The head librarian is VERY strict.)

Input / Output

  • The input will be a list of (hypothetical) book titles, one per line, from STDIN / language equivalent.
  • You can assume no more than 100 books input at a time (you can only carry so many around the library at once).
  • Books can have multiple words in their titles, and these words may be separated by spaces or other punctuation (e.g., a colon:, a dash-, etc.).
  • For ease of calculation, assume all titles are UTF-8.

Output is the same titles, sorted according to the below rules, again one per line, to STDOUT / language equivalent.

The Sorting Rules

Books are sorted numerically based on their average character value (i.e., cumulative character value divided the number of characters in the book title), counted by the following rules:

  • All characters count for determining the number of characters in a title.
  • Lowercase letters are counted by their position in the alphabet. (a=1,b=2,...z=26)
  • If the title contains capital letters, those count for 1.5 their lowercase value (A=1.5,B=3,...Z=39). ("Capital letters are important!" says the librarian.)
  • Each punctuation mark / symbol in this list !@#$%^&*()-=_+[]\{}|;':",./<>?~ counts -1 from the cumulative value before averaging. ("Grandiose titles are not!")
  • If the title contains a number, written in Arabic numerals, that number is subtracted from the average value before sorting. Multiple consecutive digits are treated as one number (e.g., 42 would subtract 42, not subtract 4 and then subtract 2). Individual digits don't count for the cumulative value (i.e., each digit contributes 0), but DO count for number of characters. Note that this may result in a negative value and should be treated appropriately. (Rumor has it, the librarian has had a crush on a math instructor for several years, now.)
  • If the title contains two separate words that start with an R, the book gets a score of "infinity" and is dumped into a pile in the corner (i.e., randomly arranged at the end of the list). (The librarian was once dumped by a person with those initials, or so you've heard.)
  • Spaces don't count for the cumulative character value (i.e., they contribute 0), but DO contribute to the number of characters in a title.
  • Characters that don't fit the above rules (e.g, a ÿ) don't count for the cumulative character value (i.e., they contribute 0), but DO contribute to the number of characters in a title.
  • For example, a hypothetical book ÿÿÿÿÿ would have a "score" of (0+0+0+0+0) / 5 = 0, but a hypothetical book ÿÿyÿÿ would have a "score" of (0+0+25+0+0) / 5 = 5.
  • Two books that happen to "score" the same can be Output in your choice of order. (They're on the same shelf, anyway)

Example Input 1

War and Peace
Reading Rainbow: The Best Unicorn Ever
Maus
Home for a Bunny

Example Output 1 (with "scores" in parentheses to show reasoning - you don't need to print them)

War and Peace (8.5)
Home for a Bunny (10.125)
Maus (15.125)
Reading Rainbow: The Best Unicorn Ever (infinity)

Example Input 2

Matthew
Mark
Luke
John
Revelations

Example Output 2 (with "scores" in parentheses to show reasoning - you don't need to print them)

Mark (12.375)
John (13)
Revelations (13.545454...)
Luke (13.75)
Matthew (~13.786)

Example Input 3

42
9 Kings
1:8
7th

Example Output 3 (with "scores" in parentheses to show reasoning - you don't need to print them)

42 (-42)
1:8 (-9.3333...)
9 Kings (~0.36)
7th (2.3333...)

Other restrictions

  • This is Code-Golf, because you need to keep the program secret from the ever-watching eyes of the librarian, and the smaller the program, the easier it is to hide.
  • Standard loophole restrictions apply
  • Don't let the librarian catch you slacking off by spending all your time on PPCG.

AdmBorkBork

Posted 2015-09-16T16:12:58.293

Reputation: 41 581

What if two books score exactly the same. i.e i Have Reading Rainbow and Ruby Rails – Kishan Kumar – 2015-09-16T16:22:49.087

@KishanKumar In that specific instance, "randomly arranged at the end of the list" since they're both double-R. In other words, take your pick. In the general case, if two words score the same, they can appear in any order relative to each other. I'll add a bullet to clarify that. – AdmBorkBork – 2015-09-16T16:25:56.500

7You need an A word so your system has an acronym name. I recommend Crazy Librarian's Amazing Sorting System :D – Geobits – 2015-09-16T16:44:06.087

3@Geobits Do you have CLASS? – AdmBorkBork – 2015-09-16T16:47:21.260

Numbers are just decimal numbers? What if there are several ones, are all of them subtracted separately? – Paŭlo Ebermann – 2015-09-16T19:48:36.147

@PaŭloEbermann Strictly speaking, they're Integers, as the . is considered punctuation and handled separately. Yes, if there are multiple numbers in the input they are each subtracted, as in the 1:8 in the 4th example -- (0+-1+0)/3 - (1+8) = -9.333.... – AdmBorkBork – 2015-09-16T19:54:40.157

But only integers in decimal notation using the digits 0 to 9, not octal, hexadecimal or strange Chinese characters, right? – Paŭlo Ebermann – 2015-09-16T19:57:06.853

@PaŭloEbermann Yes, Arabic numerals (what Unicode would call "European digits") only. Other characters should be handled as the ÿ example. I've edited the numbers portion for clarification.

– AdmBorkBork – 2015-09-16T20:01:19.967

Also, 9 Kings is (-9 + 0 + 1,5*11 + 9 + 14 + 7 + 19)/7 = ~8,07, 7th = (-7+20+8)/3 = 7 and 1:8 = (-1 + 0 - 8)/3 = -3 – Zereges – 2015-09-16T23:18:40.967

@Zereges No - integer values are subtracted after you average and before you sort, not as part of the averaging. The digits count as zero. For 7th that means (0+20+8)/3 - 7 = 2.333. for example. I expanded out the 1:8 example three comments up. Let me know if that's still not clear. – AdmBorkBork – 2015-09-17T13:01:02.783

Can you confirm that the symbols (scoring -1) are all the ASCII printable characters (ascii code 32 to 126) except alphanumeric, space (32), backslash (92) and backtick (92) ? – edc65 – 2015-10-05T17:31:49.897

@edc65 You're correct, excepting that backslash (92) is included in the symbols list on this challenge. The backtick (96) is not included in the symbols list on this challenge. However, do note that this challenge is anticipating Unicode input/output. – AdmBorkBork – 2015-10-05T17:56:47.020

Right ... the backslash was eaten when I ported the string in javascript code. Thanks – edc65 – 2015-10-05T18:07:51.150

Can we assume a maximum title lenght? – Fabian Schmengler – 2015-10-05T19:24:44.090

@fschmengler Sure. I'm not picky on that, so choose something reasonable (e.g., 50+) if it impacts your code length. None of the private tests that I have should get anywhere close to overflowing a buffer or suffering from wrapping an int or the like. – AdmBorkBork – 2015-10-05T19:27:24.413

Can I expect no title to have two capital R's in one word? – Padarom – 2015-10-06T07:10:42.707

@Padarom Like a CamelCase word? I don't have a test case that explicitly tests that one way or the other, but you should ensure that your algorithm doesn't match any capital R, just those that are at the beginning of the words.

– AdmBorkBork – 2015-10-06T12:48:46.580

Okay, that's what I was trying to find out, thanks! – Padarom – 2015-10-06T12:53:10.070

It would be so awesome to have an Ook! entry here... – Sanchises – 2015-10-07T10:04:28.223

Answers

5

APL (132)

{⎕ML←3⋄⍵[⍋{2='R'+.=↑¨⍵⊂⍨⍵≠' ':!99⋄↑(+/⍎¨'0',⍵⊂⍨⍵∊⎕D)-⍨((+/∊1.5 1×(⍳×∊⍨)∘⍵¨G)-+/⍵∊(⎕UCS 32+⍳94)~'`',⎕D,∊G←(⊂⎕A),⊂⎕UCS 96+⍳26)÷⍴⍵}¨⍵]}

Since everyone else is doing the same thing, this too is a function that takes an array of titles and returns it sorted, e.g.:

      titles
┌─────────────┬──────────────────────────────────────┬────┬────────────────┬───────┬────┬────┬────┬───────────┬──┬───────┬───┬───┐
│War and Peace│Reading Rainbow: The Best Unicorn Ever│Maus│Home for a Bunny│Matthew│Mark│Luke│John│Revelations│42│9 Kings│1:8│7th│
└─────────────┴──────────────────────────────────────┴────┴────────────────┴───────┴────┴────┴────┴───────────┴──┴───────┴───┴───┘

      {⎕ML←3⋄⍵[⍋{2='R'+.=↑¨⍵⊂⍨⍵≠' ':!99⋄↑(+/⍎¨'0',⍵⊂⍨⍵∊⎕D)-⍨((+/∊1.5 1×(⍳×∊⍨)∘⍵¨G)-+/⍵∊(⎕UCS 32+⍳94)~'`',⎕D,∊G←(⊂⎕A),⊂⎕UCS 96+⍳26)÷⍴⍵}¨⍵]}titles
┌──┬───┬───────┬───┬─────────────┬────────────────┬────┬────┬───────────┬────┬───────┬────┬──────────────────────────────────────┐
│42│1:8│9 Kings│7th│War and Peace│Home for a Bunny│Mark│John│Revelations│Luke│Matthew│Maus│Reading Rainbow: The Best Unicorn Ever│
└──┴───┴───────┴───┴─────────────┴────────────────┴────┴────┴───────────┴────┴───────┴────┴──────────────────────────────────────┘

Explanation:

  • ⎕ML←3: set ⎕ML to 3 (for )
  • ⍵[⍋{...}¨⍵]: sort the input by the values returned from the inner function
    • ↑¨⍵⊂⍨⍵≠' ': get the first character of each word
    • 2='R'+.=: see if two of these are 'R'.
    • :!99: if so, return 99! (≈ 9.3×10155). This is not quite infinity, but it'll do: a title can never have a score more than 38 times its length (ZZZZ...), so as long as no single title is bigger than about 2×10130 yottabytes, it is guaranteed that these will be at the end.
    • : otherwise:
    • (...)÷⍴⍵: divide the score by the length of after calculating it:
      • G←(⊂⎕A),(⎕UCS 96+⍳26): store in G the uppercase and lowercase letters
      • (⎕UCS 32+⍳94)~'`',⎕D,∊G: the printable ASCII characters, except letters, digits, spaces and '`', which are the characters for which a point is subtracted. (This is shorter than writing them all out, because G is used later on.)
      • +/⍵∊: count the amount of these characters in
      • -: subtract this from:
      • +/∊1.5 1×(⍳×∊⍨)∘⍵¨G: the sum of 1.5 × the scores for the capitals, and 1 × the scores for the lowercase letters.
    • -⍨: afterwards, subtract the total of the numbers in :
      • ⍵⊂⍨⍵∊⎕D: find the groups of digits in
      • '0',: add '0', to prevent the list being empty
      • ⍎¨: evaluate each string
      • +/: find the sum

marinus

Posted 2015-09-16T16:12:58.293

Reputation: 30 224

Instead of !99 you could use ⌊/⍬ – Adám – 2015-10-07T02:00:00.800

1I love looking at long code in APL. It makes me feel like the world is so much large than I. And I like symbols. – Conor O'Brien – 2015-10-08T11:17:03.340

2

Lua 5.3, 366 364 Bytes

r={}for i,s in ipairs(arg)do n=0 s:gsub("%l",function(a)n=n+(a:byte()-96)end):gsub("%u",function(a)n=n+(a:byte()-64)*1.5 end):gsub("%p",function(a)n=n-1 end):gsub("^R?.- R.- ?R?",function()n=math.huge end)m=n/utf8.len(s)s:gsub("%d+",function(a)m=m-a end)table.insert(r,{s=s,n=m})end table.sort(r,function(a,b)return a.n<b.n end)for i,v in ipairs(r)do print(v.s)end

This code only works in Lua 5.3 because it needs to deal with Unicode characters. If you don't care about Unicode, then replace "utf8" with "string" and it will work fine with Lua 5.2 or 5.1.

It takes its inputs from command-line arguments, so either run it from the command line or put this code above my answer:

arg = {"Title 1", "Title 2", "Title 3"}

Trebuchette

Posted 2015-09-16T16:12:58.293

Reputation: 1 692

I don't have Lua 5.3 on my machine, but I followed your suggestion to swap utf8 with string on Ideone and got no output.

– AdmBorkBork – 2015-09-16T17:56:24.020

@TimmyD see my edit – Trebuchette – 2015-09-16T18:25:09.443

Good. Gravy. And (arg) is sitting there staring me in the face. This question has apparently fried my brain. Have a +1. – AdmBorkBork – 2015-09-16T18:33:56.720

With MoonScript, this is 266 bytes: http://pastebin.com/wr4qVs5h.

– kirbyfan64sos – 2015-09-16T20:28:41.067

2

JavaScript (ES6), 210 218 251

As a function with an array argument, returned sorted.

f=L=>(S=s=>([...s].map(c=>t-=(a=s.charCodeAt(l++))>32&a<48|a>57&a<65|a>90&a<96|a>122&a<127?1:a>64&a<123?96-(a<96?a*1.5:a):0,l=t=0),s.split(/\D/).map(n=>t-=n,t/=l),t/!s.split(/\bR/)[2]),L.sort((a,b)=>S(a)-S(b)))

//TEST

test1=['War and Peace','Reading Rainbow: The Best Unicorn Ever','Maus','Home for a Bunny']
test2=['Matthew','Mark','Luke','John','Revelations']
test3=['42','9 Kings','1:8','7th']

;O.innerHTML=f(test1)+'\n\n'+f(test2)+'\n\n'+f(test3);

// The comparing function used to sort, more readable

Sort=s=>(
  t = 0, // running total
  l = 0, // to calc the string length avoiding the '.length' property
  [...s].map(c=>{
    a=s.charCodeAt(l++);
    t-=a>32&a<48|a>57&a<65|a>90&a<96|a>122&a<127
      ? 1 // symbols (ASCII char except space, alphanumeric and backtick)
      : a>64&a<123 
        ? 96-(a<96?a*1.5:a) // alphabetic both upcase and lowcase, and backtick
        // lowcase: 96-a, upcase (64-a)*1.5=>96-a*1.5, backtick is 96 and 96-96 == 0
        : 0 // else space, non ASCII, and numeric : 0
  }),
  t = t/l, // average
  s.split(/\D/).map(n=>t-=n), // sub number values
  f = s.split(/\bR/)[2], // split at words starting with R, if less then 2 f is undefined
  t/!f // dividing by not f I can get the infinity I need
)
<pre id=O></pre>

edc65

Posted 2015-09-16T16:12:58.293

Reputation: 31 086

Nicely done. For reference to anyone else reading this answer, I had to change O.innerHTML to this.InnerHTML in Firefox's console. – AdmBorkBork – 2015-10-05T19:23:17.317

2

Mathematica, 253 216 bytes (214 chars)

r=RegularExpression;c=ToCharacterCode;f=SortBy[Tr@Flatten@Reap[StringCases[#,
{r@"(\\bR.*)+"->∞,r@"\\d+":>0Sow@-FromDigits@"$0",r@"[a-z]":>c@"$0"-96,
r@"[A-Z]":>1.5c@"$0"-96,r@"[!-/:-@[-_{-~]"->-1}]/StringLength@#]&]

Call the function like f[{"42", "9 Kings", "1:8", "7th"}]; it will return a sorted list of the inputs.

Just barely made it! Mathematica's pattern matching is not as concise when strings are involved, and I just get killed by those long names. The extra two bytes are for the Infinity unicode character.

(Let me know if I've fallen afoul of any Standard Loopholes.)

Update

Looking a little closer at edc65's answer, it looks like the OP will accept a function that sorts a list of strings. With that in mind we can use the curried form of SortBy (which Mathematica calls the "operator form"); with one argument (the function applied to the list elements to determine their order) it behaves like a function that takes one argument, returning the sorted form of the input; that is, SortBy[list, f] is equivalent to (SortBy[f])[list].

Ungolfed

Function[{titles},
  SortBy[titles, Function[{str}, (* sort by function value *)
    Total[Flatten[Reap[ (* total up all the parts *)
      StringCases[str, {
        RegularExpression["(\\bR.*){2}"] -> Infinity
          (* matches R at the start of a word twice, adds infinity to the total *),
        RegularExpression["\\d+"] :> 0 * Sow[-FromDigits["$0"]]
          (* matches a number, Sows it for Reap to collect, then multiplies by zero
                                                          to not affect the average *),
        RegularExpression["[a-z]"] :> ToCharacterCode["$0"] - 96
          (* matches a lowercase letter and returns its value *),
        RegularExpression["[A-Z]"] :> 1.5 ToCharacterCode["$0"] - 96
          (* matches an uppercase letter and returns 1.5 its value *),
        RegularExpression["[!-/:-@[-_{-~]"] -> -1
          (* matches a 'grandiose' symbol and returns -1 *)
      }] / StringLength[#] (* averages character values *)
    ]]]
  ]]
]

2012rcampion

Posted 2015-09-16T16:12:58.293

Reputation: 1 319

1Nice answer, and you get an Internet Cookie for literally using "infinity" in your calculations ;-). – AdmBorkBork – 2015-10-06T12:45:48.620

@TimmyD The beauty of symbolic math processing =) – 2012rcampion – 2015-10-06T14:54:54.980

Probably you mean 214 chars, 216 bytes. Well done, I tried to compete but no way – edc65 – 2015-10-06T21:26:53.920

1

Perl 5, 190 bytes

sub p{$_=pop;chomp;$c=-y/A-Za-z0-9 \\`//c;map$c+=(32&ord$_?1:1.5)*(31&ord),/[a-z]/gi;$c/=length;map$c-=$_,/\d+/g;$c}say(sort{$y=$b=~/\bR.*\bR/;($x=$a=~/\bR.*\bR/)||$y?$x-$y:(p($a)<=>p$b)}<>)

Try it online!

Xcali

Posted 2015-09-16T16:12:58.293

Reputation: 7 671

1

C#, 352 349 Bytes

Due to the magic of linq:

class A{static void Main(string[]a){foreach(var x in a.OrderBy(b=>{var s="0";int j=0;return Regex.Split(b,@"[^\w]+").Count(l=>l[0]=='R')==2?(1/0d):b.Aggregate(0d,(d,e)=>{if(e>47&e<58){s+=e;return d;}d+=(e>64&e<91)?(e-64)*1.5:(e>96&e<123)?e-96:e>32&e<127&e!=96?-1:0;j+=int.Parse(s);s="0";return d;})/b.Length-j-int.Parse(s);}))Console.WriteLine(x);}}

Could have saved another 6 bytes if backtick would be included in the punctuation list!

class A
{
    static void Main(string[] a)
    {
        foreach (var x in a.OrderBy(b =>
            {
                var s = "0";
                int j = 0;
                return Regex.Split(b, @"[^\w]+").Count(l => l[0] == 'R') == 2
                    ? (1 / 0d)
                        : b.Aggregate(0d, (d, e) =>
                        {
                            if (e > 47 & e < 58) { s += e; return d; }
                            d += (e > 64 & e < 91) ? (e - 64) * 1.5 : (e > 96 & e < 123) ? e - 96 : e > 32 & e < 127 & e != 96 ? -1 : 0;
                            j += int.Parse(s);
                            s = "0";
                            return d;
                        }) / b.Length - j - int.Parse(s);
            }))
            Console.WriteLine(x);
    }

}

Yitz

Posted 2015-09-16T16:12:58.293

Reputation: 21

1

Go, 755 Bytes

package main
import("os"
"fmt"
"math"
"bufio"
"regexp"
"sort"
"strconv")
type F float64
type T []F
func(t T)Swap(i,j int){t[i],t[j],S[i],S[j]=t[j],t[i],S[j],S[i]}
func(t T)Len()int{return len(t)}
func(t T)Less(i,j int)bool{return t[i]<t[j]}
var S []string
func main(){var t T
for{b:=bufio.NewReader(os.Stdin)
w,_,_:=b.ReadLine()
if len(w)==0{break}
u:=string(w)
var v F
for _,c:=range u{if 96<c&&c<123{v+=F(c)-F(96)}else
if 64<c&&c<91{v+=(F(c)-64)*1.5}else
if (48>c&&c>32)||(c>57&&c<127){v-=1}}
a:=v/F(len(w))
r,_:=regexp.Compile("[0-9]+")
n:=r.FindAllString(string(w),-1)
for _,x:=range n{y,_:=strconv.Atoi(x);a-=F(y)}
if m,_:=regexp.Match("((^| )R.*){2}",w);m{a=F(math.Inf(1))}
S=append(S,u)
t=append(t,a)}
sort.Sort(t)
for _,o:=range S{fmt.Println(o)}}

The formatted version:

package main

import (
    "bufio"
    "fmt"
    "math"
    "os"
    "regexp"
    "sort"
    "strconv"
)

type F float64
type T []F

func (t T) Swap(i, j int)      { t[i], t[j], S[i], S[j] = t[j], t[i], S[j], S[i] }
func (t T) Len() int           { return len(t) }
func (t T) Less(i, j int) bool { return t[i] < t[j] }

var S []string

func main() {
    var t T
    for {
        b := bufio.NewReader(os.Stdin)
        w, _, _ := b.ReadLine()
        if len(w) == 0 {
            break
        }
        u := string(w)
        var v F
        for _, c := range u {
            if 96 < c && c < 123 {
                v += F(c) - F(96)
            } else if 64 < c && c < 91 {
                v += (F(c) - 64) * 1.5
            } else if (48 > c && c > 32) || (c > 57 && c < 127) {
                v -= 1
            }
        }
        a := v / F(len(w))
        r, _ := regexp.Compile("[0-9]+")
        n := r.FindAllString(string(w), -1)
        for _, x := range n {
            y, _ := strconv.Atoi(x)
            a -= F(y)
        }
        if m, _ := regexp.Match("((^| )R.*){2}", w); m {
            a = F(math.Inf(1))
        }
        S = append(S, u)
        t = append(t, a)
    }
    sort.Sort(t)
    for _, o := range S {
        fmt.Println(o)
    }
}

Implementing a custom sort interface made it longer than expected. The program reads from STDIN until end of input o a blank line is entered.

Fabian Schmengler

Posted 2015-09-16T16:12:58.293

Reputation: 1 972

1

PHP, 362 367 Bytes

<?for(;$w=fgets(STDIN);$S[]=$w){for($l=$i=mb_strlen($w);$i--;){$c=array_sum(unpack("C*",mb_substr($w,$i,1)));96<$c&&$c<123 and $v+=$c-96 or 64<$c&&$c<91 and $v+=1.5*$c-96 or 48<$c&&$c>32||$c>57&&$c<127 and $v-=1;}$v/=$l;preg_match_all("/\d+/",$w,$m);$v-=array_sum($m[0]);preg_match("/((^| )R.*){2}/",$w)&&$v=INF;$t[]=$v;}array_multisort($t,$S);echo join("
",$S);

Formatted version:

<?php
for (; $w = fgets(STDIN); $S[] = $w) {
    for ($l = $i = mb_strlen($w); $i--;) {
        $c = array_sum(unpack("C*", mb_substr($w, $i, 1)));
        96 < $c && $c < 123 and $v += $c - 96
        or 64 < $c && $c < 91 and $v += 1.5 * $c - 96
        or 48 < $c && $c > 32 || $c > 57 && $c < 127 and $v -= 1;
    }
    $v /= $l;
    preg_match_all("/\d+/", $w, $m);
    $v -= array_sum($m[0]);
    preg_match("/((^| )R.*){2}/", $w) && $v = INF;
    $t[] = $v;
}
array_multisort($t, $S);
echo join("
", $S); 

Interesting lines:

$c = array_sum(unpack("C*", mb_substr($w, $i, 1)));

Converts a single UTF-8 character to its byte values and sums them, so that we get the real value for ASCII characters and a value higher than 127 for multibyte characters.

96 < $c && $c < 123 and $v += $c - 96
or 64 < $c && $c < 91 and $v += 1.5 * $c - 96
or 48 < $c && $c > 32 || $c > 57 && $c < 127 and $v -= 1;

Makes use of low operator precedence of and and or to assign the character value in a single statement without if.

Fabian Schmengler

Posted 2015-09-16T16:12:58.293

Reputation: 1 972