Convert Complex Chinese Numbers into Arabic Numbers

9

2

Your task is to convert Chinese numerals into Arabic numerals.

A problem similar to Convert Chinese numbers, however, more complex. Also, answers given there mostly don't satisfy all the conditions.

Chinese digits/numbers are as follows:
0 零 1 一 2 二 2 两 3 三 4 四 5 五 6 六 7 七 8 八 9 九 10 十 100 百 1000 千 10000 万 10^8 亿

Multiple-digit numbers

Multiple-digit numbers are created by adding from highest to lowest and by multiplying from lowest to highest. In case of additions, each number higher than 9 can be multiplied by 1 and it won't change its meaning. Both 亿万千百十一 and 一亿一万一千一百一十一 are equal to 100011111.
We multiply in the following fashion: 五千 = 5000 一百万 = 1000000 三千万 = 30000000.

Chinese always takes the lowest possible multiplier (just like we don't say hundred hundred but ten thousand). So 百千 doesn't exist to represent 100000 since we have 十万, 十千 doesn't exist since we have , 十千万 doesn't exist, since we have 亿, 十百 doesn't exist, since we have .

Special cases

0 is very important and it was actually the biggest problem in the other code golf question. Trailing zeroes are omitted in Chinese, so indicates interior zeroes.

Let's look at some examples:

  • 三百零五 = 305
  • 三百五 = 350 - no interior zeroes. You can notice that we don't need here, since a trailing zero is omitted.
  • 一千万零一百 = 10000100
  • 三千零四万 = 30040000
  • 六亿零四百零二 = 600000402 - here we have 2 interior zeroes. As you can see though, even if there's a gap of more than one order of magnitute (in the example it's 亿 and ), two 零s can't stand next to each other, one is enough for each gap, no matter how big it is.
  • 一亿零一 = 100000001 - again, no need for more than one if there's one gap, no matter how big.
  • 八千万九千 = 80009000 - no need for since there are no interior zeroes. Why are there no interior zeroes? Because it follows the highest-to-lowest addition without omitting an order of magnitude. Right after we have ( is a multiplication component, not addition one) and not, let's say, .

More examples: Check out the two "示例" paragraphs

2 is also special in Chinese as it can be represented with a character if it's a multiplier of 100 and higher numerals. Both 两千两百二十二 and 二千二百二十二 are 2222.

Rules

Constraints: 0 <= N < 10^9
Edit: I don't care what happens from 10^9 onwards. Input doesn't have any examples equal or higher than 10^9 for that reason.

Test cases

Input:

一亿两千三百零二万四千二百零三
四千万零一十
三十四万五千五
四亿二十万零九百零一
两千万九千零一十
二十四万零二百二十二
两千零十二
零

Output:

123024203
40000010
345500
400200901
20009010
240222
2012
0

Good luck!

Rysicin

Posted 2019-11-17T15:53:58.003

Reputation: 93

5Chinese speaker here - I don't think I ever use for interior zeroes unless the place value isn't specified - ex. 350 (三百五) vs 305(三百零五). Otherwise, the position is clearly indicated by the base characters.... (ex. 2012 would be 两千十二 which literally reads two thousand ten two, no need for interior zeroes) – Quintec – 2019-11-17T16:39:14.767

1Thank you for your response! You are right that if a zero is before anything other than a unit digit it's optional (but not prohibited). Let's take the 2012 case as an example. It can be both. If you look at chapter numbers for internet novels, they indeed always add 零. – Rysicin – 2019-11-17T16:50:45.330

1So just to clarify - is it ok to output with or without? You should specify this in the question – Quintec – 2019-11-17T16:52:42.477

1The output is Arabic numerals, in the input there are many cases with 零. As long as the input matches the output, it's correct. Your code should therefore be able to correctly interpret an example with 零 inside, while also output 0 for 零 as input. – Rysicin – 2019-11-17T17:01:01.620

Is there any special logic by which one can say 一亿两千三百零二万四千二百零三 is 1100000000+(21000+3100+010+2)10000+41000+2100+010+3 and not (1100000000+21000+3100+010+2)10000+41000+2100+010+3? ) – Alexey Burdin – 2019-11-17T18:55:54.677

1I'm not sure if I understand the question, as your second example returns a number higher than a trillion. The first one is correct. Maybe you can think of it as building blocks? (1)[10^8] (2302)[10000] (4)[1000] (2)[100] (0)[10] (3)[1]

Edit: Of course the (2302) block creator is also made of a smaller block - (2)[1000] (3)[100] (0)[10] (2)[1] – Rysicin – 2019-11-17T19:16:46.533

1Ok, I understand the question now, Yes, there is. "Multiple-digit numbers are created by adding from highest to lowest and by multiplying from lowest to highest." and "Chinese always takes the lowest possible multiplier." Just like in English you don't have thousand hundred but hundred thousand. – Rysicin – 2019-11-17T19:27:07.003

As you can see in my blocks example, () bracket have to be smaller than [] bracket, unless it's [1]. Also for [100] and [1000] the () bracket has to be lower than 10. For [10000] the () bracket has to be lower than 10000. For [10^8] we have a constraint, so () has to be lower than 10. Hope that helps. – Rysicin – 2019-11-17T22:25:16.480

Remember that the output for 零 itself should be 0. Why not make it a test case, then? – Grimmy – 2019-11-18T14:02:33.047

Fair point. Added. – Rysicin – 2019-11-18T22:31:28.403

Answers

5

05AB1E, 52 51 bytes

0•9∊»£¬ƵçoiKβÐg•19вIÇƵª%èvX¨UOy¬iθ°UX‰¤_ªX*]DgiX*}O

Try it online!

0                              # literal 0
 •9∊»£¬ƵçoiKβÐg•19в            # compressed list [2, 14, 0, 3, 0, 8, 0, 6, 2, 9, 0, 18, 0, 1, 0, 13, 5, 0, 10, 0, 0, 7, 12, 4]
                   IÇ          # codepoints of the input
                     Ƶª%       # modulo 270
                        è      # index into the above list, with wraparound
v                 ]            # for each number y in that list:
 X                             #  push the variable X
  ¨                            #  drop the last digit
   U                           #  store that in X
    O                          #  sum all numbers on the stack
     y                         #  push y
      ¬i          ]            #  if the first digit of y is 1:
        θ°                     #   10**(the last digit of y)
          U                    #   store that in X
           X‰                  #   divmod X
             ¤_ª               #   if the modulo is 0, append 1
                X*             #   multiply by X
Dgi  }                         # if length of the top of stack is 1:
   X*                          #  multiply it by X
      O                        # sum of the stack
                               # implicit output

Grimmy

Posted 2019-11-17T15:53:58.003

Reputation: 12 521

The output is wrong for 一亿两千三百零二万四千二百零三 and will be wrong for anything that ends with 零2-9. – Rysicin – 2019-11-18T22:41:48.960

Also, it doesn't meet the upper constraint. I put it in place for 3 reasons: because the numbers are already high enough as it is; it just gets repetitive; we encounter a problem similar to long-short scale for arabic numbers (e.g. billion). The next numeral after 亿 is . However, it can mean both 10^6, 10^12 or 10^16 depending on context. I think due to this confusion you would more often see 万亿 instead of in the context of e.g. trillion yuan. There's also an abberation of 亿亿 after this point... So let's just stop at 10^9. – Rysicin – 2019-11-18T23:28:54.973

@Rysicin: I fixed the 零2-9 case. I don’t understand your second comment, could you give an example of an input I handle incorrectly, and what the correct output would be? – Grimmy – 2019-11-19T00:01:48.840

The second comment is about the constraints. If the number is 10^9, then there shouldn't be any output as it goes over the limit. (If you wish, I can raise the constraint, however, we'd need to stop somewhere.) Your code edit created a bug with numbers ending with 0 instead ending with 1. Check for example 一百 一千. – Rysicin – 2019-11-19T00:14:37.037

2

@Rysicin Things to avoid when writing challenges: Input validation. By default, it is assumed that all inputs are valid. If you really want to make input validation part of the question, you should explicitly state so in the question and add several test cases covering it.

– Grimmy – 2019-11-19T00:49:23.550

1@Rysicin I fixed the issue with numbers ending with 0. – Grimmy – 2019-11-19T00:54:03.963

1Good job, looks like everything is correct! Thanks for the feedback, I'll remove constraints and just write that I don't care about what happens starting 10^9. – Rysicin – 2019-11-19T01:00:24.993