Optimal short-hand roman numeral generator

21

2

Goal:
Write a function that takes a number as input and returns a short-hand roman numeral for that number as output.

Roman Numeral Symbols:

Symbol  Value
I       1
V       5
X       10
L       50
C       100
D       500
M       1,000


For an example of what I mean when I say "short-hand roman numerals", let's consider finding a roman numeral to represent 1983, because that is the year I was born. One option is to do this the normal way (10 letters):

1983 = MCMLXXXIII = (1000 - 100 + 1000 + 50 + 30 + 3)

The other option is to do it the short-hand way (6 characters):

1983 = MXVIIM = (1000 - (10 + 10) + 1000 + 3)

Do you know what this means?!?!!?? If I was roman I could have saved 4 characters every single time I wrote my birth date! Woot Woot!!

However, before I get ahead of myself in excitement, I have a question to write, so I should probably define the short-hand roman numeral rules so we are all on the same page:

Short-Hand Roman Numeral Rules:

1. Always consider symbols from left to right until there are no more characters to consider.
2. If there are no higher-valued symbols to the right of the current symbol:
• Add the value of the current symbol to the running total of this roman numeral.
3. If there are higher-valued symbols to the right of the symbol you are considering:
• Locate the rightmost highest-valued symbol to the right of the current symbol
• Consider all the characters up to that symbol as one roman numeral
• Calculate the value of that roman numeral using these steps
• Subtract the value of that roman numeral from the running total of this roman numeral.
• Move to the next symbol after the group you just considered
4. Each roman numeral must have at least 1 symbol in it.
5. That's it! Anything following these rules will be accepted!

Examples:

IIIIV = (-(1+1+1+1)+5) = 1  //Don't ask me why you'd want to do this!

VVX = (-(5+5) + 10) = 0  //Who said you couldn't represent 0 with roman numerals?!!?

VVXM = (-(-(5+5) + 10) + 1000) = 1000  //Again...don't ask me why you'd want to do this!

MXIIXMI = (1000-(10-(1+1)+10)+1000+1) = 1983  //Ahhh...such a great year :)


Question Rules:

1. Make a function that takes a single number as input and returns a roman numeral for that number as output using the above rules. Calculate the codeGolfScore of this function.

example input: 2011
example possible output: MMXI
another possible output: MMVVIVV     //(2000 + 10 - 4 + 5)

2. Using your function from rule 1, generate the roman numerals between -1000 (that's right, NEGATIVE one-thousand) and 3000. Then sum up the character length of these roman numerals to get your totalCharacterCount. Here's some pseudocode to clarify:

totalCharacterCount = 0;
for(currentNumber = -1000; currentNumber <= 3000; currentNumber++){
totalCharacterCount += getRomanNumeral(currentNumber).length;
}

3. finalScore = codeGolfScore + totalCharacterCount

4. Lowest finalScore wins!

Note: As the totalCharacter count will be in the ten-thousands+, the character-length algorithm should be top priority. Code-golf scores are just the tie-breaker in case multiple users find the optimal algorithm or algorithms that are close to each other.

Good luck, and have fun at your MMXII celebrations tomorrow night!!!

Is IDC as well as DIC allowed for 599 (499+100 vs. 500+99)? – user unknown – 2012-04-19T18:02:22.453

@userunknown: Yep, both are allowed. – Briguy37 – 2012-04-19T18:16:36.740

1Great task! However, could you give an example of how a negative roman shorthand looks like? Does DDDDM stand for -1000? – pimvdb – 2011-12-30T19:00:38.253

@pimvdb You got it! – Briguy37 – 2011-12-30T19:37:30.687

A question regarding the special case zero: Is "" allowed for zero or do we have to use VVX or something equivalent? – Howard – 2011-12-30T19:46:22.633

@Howard: Great question, I hadn't thought of that! I've added roman numeral rule 4 for which clarifies that case. – Briguy37 – 2011-12-30T20:00:59.650

1"Locate the rightmost highest-valued symbol to the right of the current symbol" -- which wins, rightmost or highest-valued? i.e., is IXV = -(-1 + 10) + 5 = -4 (rightmost wins), or IXV = -1 + 10 + 5 = 14 (highest-valued wins)? – Keith Randall – 2011-12-30T21:06:21.970

@KeithRandall I think it must be latter. Otherwise it would have to read "rightmost higher-valued". – Howard – 2011-12-30T21:20:33.663

@KeithRandall: Highest value wins, so IXV = 14, and IXVX =-14+10= -4. – Briguy37 – 2011-12-30T21:32:49.663

5

C++, 345 characters of code, 25021 roman numeral digits = 25366

int N[]={1,5,10,50,100,500,1000};int V(int s,int L){if(!L)return 0;int K=0,B,m=s%7+1;for(int k=1,b=7;k<L;k++,b*=7){if(s/b%7>=m){K=k;B=b;m=s/b%7;}}return K?V(s/B,L-K)-V(s%B,K):N[s%7]+V(s/7,L-1);}char r[99];char*f(int n){for(int L=1,B=7,i,j;1;L++,B*=7){for(i=0;i<B;i++){if(V(i,L)==n){for(j=0;j<L;j++){r[j]="IVXLCDM"[i%7];i/=7;}r[L]=0;return r;}}}}


deobfuscated a bit, with a driver:

int N[]={1,5,10,50,100,500,1000};
int V(int s,int L){
if(!L)return 0;
int K=0,B,m=s%7+1;
for (int k=1,b=7;k<L;k++,b*=7) {
if(s/b%7>=m){K=k;B=b;m=s/b%7;}
}
return K ? V(s/B,L-K)-V(s%B,K) : N[s%7]+V(s/7,L-1);
}
char r[99];
char *f(int n){
for(int L=1,B=7;1;L++,B*=7) {
for(int i=0;i<B;i++) {
if(V(i,L)==n){
for(int j=0;j<L;j++) {
r[j]="IVXLCDM"[i%7];i/=7;
}
r[L]=0;
return r;
}
}
}
}
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
printf("%s\n", f(atoi(argv[1])));
}


V computes the numerical value of a given roman numeral string s of length L. Strings are encoded base 7 (first digit is s%7, second digit is s/7%7, ...). Each digit is encoded with I=0, V=1, ..., M=6. f does a brute-force enumeration of possible roman numeral strings to find one that V evaluates to n.

The total number of roman numeral digits is optimal. The longest roman numeral needed for [-1000,3000] is 11 digits (e.g. -827=CMDDMLXXIII), which takes about 5 minutes on my machine.

Wait a moment, that doesn't behave the way specified. Your program gives e.g. LMCLXXIII as the answer to -777. I'd read that as -50+1000-100+50+10+10+3 = 923 ≠ -777, only with "rightmost higher -valued" instead of "highest" does it give -777. But that was just what you asked for in the comments! – ceased to turn counterclockwis – 2012-01-01T02:10:05.543

@leftaroundabout: of course you are right. I'll fix it, but no time right now... – Keith Randall – 2012-01-01T04:19:22.143

@leftaroundabout: ok, all fixed. – Keith Randall – 2012-01-01T18:32:02.360

All right. It's not optimal now, though (e.g. gives VVVXI for -4 when IXVX is actually shorter, as I just noticed) – but that's perfectly legal. – ceased to turn counterclockwis – 2012-01-03T13:24:12.373

@leftaroundabout: ok, fixed again. Hopefully it is correct this time... – Keith Randall – 2012-01-03T18:58:08.903

5

Haskell, 25637 (= 268 + 25369) 26045 (= 222+25823)

r 0="VVX"
r n=s(zip[1000,500,100,50,10,5]"MDCLXV")n ξ
ξ='ξ'
s[]q f
|q<0=s[](5-q)f++"V"
|q<1=""
|r<-q-1='I':s[]r f
s ω@((v,a):l)q f
|q>=v,f/=a=a:s ω(q-v)ξ
|f==a,γ<-'I':a:s l(q-v+1)ξ,η γ<η(s l q ξ)=γ
|f==ξ,γ<-s ω(v-q)a++[a],η γ<η(s l q ξ)=γ
|True=s l q ξ
η=length


to be used as e.g.

GHCi> r 7
"VII"
GHCi> r 39
"XIL"
GHCi> r (-39)
"ICXLC"        --  "LLXILC" in my original version
GHCi> r 1983
"MXVIIM"
GHCi> r 259876
"MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMCXXIVM"


You can evaluate the length sum with the straightforward

GHCi> sum . map(length.r) \$ [-1000..3000]
25369


Which takes something in the order of a minute.

2

Ruby, 25987 (= 164 + 25823)

h=->i,d,v,k{i==0?'':i<v ?(a=h[v-i,x=d[1..-1],v/k,k^7]+d[0];i<0?a:(b=h[i,x,v/k,k^7];a.size<b.size ? a :b)):d[0]+h[i-v,d,v,k]}
r=->i{i==0?'DDM':h[i,'MDCLXVI',1000,2]}


You can call r directly to get the results. The sum over the specified range yields

> (-1000..3000).map{|i|r[i].size}.reduce &:+
25823


which is the optimal sum as with the other solutions.

0

C# 23537 (639 characters of code + 22898 characters of output)

class M
{
public static string R(int n, int? s = new int?())
{
var r = "";
var D = new Dictionary<int, string> {{ 1000, "M"}, { 900, "CM"},{ 800, "CCM"},{ 500, "D"}, { 400, "CD"},{ 300, "CCD"},{100, "C"}, {90, "XC"},{80, "XXC"},{50, "L"}, {40, "XL"}, {10, "X"}, {9, "IX"}, {8, "IIX"}, {5, "V"}, {4, "IV"},{1, "I"}};
if (n == 0) return "VVX";
if (n == -1) return "IIIIIIV";
if (n < 0) return N(n * -1);

foreach(int k in D.Keys)
{
if (s.HasValue && k > s) continue;

while(k <= n)
{
n -= k;
r += D[k];
}
}

return r;
}

public static string N(int n)
{
var D = new Dictionary<int, string> {{1, "I"}, {5, "V"}, {10, "X"}, {50, "L"}, {100, "C"}, { 500, "D"}, {1000, "M"}};

int i = D.Keys.First(v => v >= n), m = D.Keys.Where(v => v < i).Max();

return R(n + i, m) + D[i];
}
}


To Calculate:

Enumerable.Range(-1000, 3000).Sum(i => M.R(i).Length);

And what's your score? – user unknown – 2012-04-19T12:38:08.467