Japanese pitch accent numbers

6

Your challenge is to turn a Japanese word and a dictionary pitch accent number into a new string where the rises and falls in pitch are marked: e.g. (2, ウシロ)ウ/シ\ロ.

To help you out with this, I'll explain a little about Japanese phonology.

Background: on moras

For the purpose of this challenge, we will write Japanese words in katakana, the simplest Japanese syllabary. A word consists of one or more moras, each of which consists of

  1. one of:

    アイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワンッガギグゲゴザジズゼゾダヂヅデドバビブベボパピプペポ
    
  2. optionally followed by one of:

    ャュョ
    

For example, シャッキン consists of 4 moras: シャ, ッ, キ, ン.

※ The three optional characters are small versions of katakana ヤユヨ (ya yu yo).
  For example, キャ is ki + small ya, pronounced kya (1 mora), whereas キヤ is kiya (2 moras).

Background: pitch accents

Japanese words are described by a certain contour of high and low pitches. For example, ハナス (hanasu, speak) is pronounced

   ナ
  /  \
ハ    ス–···

meaning the pitch goes up after ハ ha, then falls after ナ na, and stays low after ス su.
(That is to say: Unaccented grammar particles after su will be low again).

We might describe this contour as “low-high-low-(low…)”, or LHL(L) for short.

In a dictionary, the pitch of this word would be marked as 2, because the pitch falls after the second mora. There are a few possible pitch patterns that occur in Tōkyō dialect Japanese, and they are all given a number:

  • 0, which represents LHHH…HH(H). This is heibangata, monotone form.

       イ–バ–ン–···                ···
      /                  or       /
    ヘ                          ナ
    
  • 1, which represents HLLL…LL(H). This is atamadakagata, “head-high” form.

    イ                          キ
      \                  or       \
       ノ–チ–···                   ···
    
  • n ≥ 2, which represents LHHH… (length n) followed by a drop to L…L(L).
    For example, センチメエトル has pitch accent number 4:

       ン–チ–メ
      /       \
    セ         エ–ト–ル–···
    

    Such a word must have ≥ n moras, of course.

※ Note the difference between ハシ [0] and ハシ [2]:

   シ–···                シ
  /            vs       /  \
ハ                    ハ     ···

You can hear the different pitch accent numbers demonstrated here ♪.

The challenge

Given an integer n ≥ 0 (a pitch accent number as above) and a valid word of at least n moras, insert / and \ into it at points where those symbols would occur in the diagrams above; i.e. at points in the word where the pitch rises or falls.

The output for (2, ハナス) would be:

ハ/ナ\ス

And the output for (0, ヘイバン) would be:

ヘ/イバン

Remember: you must correctly handle sequences like キョ or チャ as a single mora.
The output for (1, キョウ) is キョ\ウ, not キ\ョウ.

Instead of / and \, you may pick any other pair of distinct, non-katakana Unicode symbols.

This is : the objective is to write the shortest solution, measured in bytes.

Test cases

One per line, in the format n word → expected_output:

0 ナ → ナ/
0 コドモ → コ/ドモ
0 ワタシ → ワ/タシ
0 ガッコウ → ガ/ッコウ
1 キ → キ\
1 キヤ → キ\ヤ
1 キャ → キャ\
1 ジショ → ジ\ショ
1 チュウゴク → チュ\ウゴク
1 ナニ → ナ\ニ
1 シュイ → シュ\イ
1 キョウ → キョ\ウ
1 キャンバス → キャ\ンバス
2 キヨカ → キ/ヨ\カ
2 キョカ → キョ/カ\
2 ココロ → コ/コ\ロ
2 ジテンシャ → ジ/テ\ンシャ
3 センセイ → セ/ンセ\イ
3 タクサン → タ/クサ\ン
4 アタラシイ → ア/タラシ\イ
4 オトウト → オ/トウト\
6 ジュウイチガツ → ジュ/ウイチガツ\

Generated by this reference solution.

Lynn

Posted 2018-10-16T19:13:08.623

Reputation: 55 648

So there's nothing like 8 ジュウイチガツ? – l4m2 – 2018-10-19T00:02:18.947

@l4m2 That's correct. ジュウイチガツ only has 6 moras, so there can't be a pitch drop "after the 8th mora of the word." – Lynn – 2018-10-19T12:26:29.743

Answers

2

Retina 0.8.2, 68 bytes (UTF-8)

\d+(.[ャュョ]?)
$1/$&$*1\
/1\b

/.
/
+`1(1*.)(.[ャュョ]?)
$2$1

Try it online! Link includes test cases. Explanation:

\d+(.[ャュョ]?)
$1/$&$*1\

Convert the number to unary, wrap it in /\, and insert it after the first mora.

/1\b

If the number was 1, then delete it and the /, leaving the \.

/.
/

Subtract 1 from the number, but in the case of 0, this deletes the \, leaving the /.

+`1(1*.)(.[ャュョ]?)
$2$1

Move the \ one mora to the right for each remaining 1.

Neil

Posted 2018-10-16T19:13:08.623

Reputation: 95 035

1

Jelly, 32 bytes

O%62ḟ€“579‘œṗ⁸Ḋ⁹‘+Ị$,2¤œṖżCØ.ḟƊ}

A full program accepting a list of characters and a non-negative integer which prints using 0 for / and 1 for \.

Try it online! Or see the test-suite.

How?

O%62ḟ€“579‘œṗ⁸Ḋ⁹‘+Ị$,2¤œṖżCØ.ḟƊ} - Main Link: list of characters S, integer N
O                                - convert to ordinals
 %62                             - modulo by 62
      “579‘                      - code-page indices list = [53,55,57]
                                 -   (representing the three optional characters)
     €                           - for each ordinal:
    ḟ                            -   filter discard if in [53,55,57]
             ⁸                   - chain's left argument, S
           œṗ                    - partition before truthy indices
                                 -   (so before the non optional character positions)
              Ḋ                  - dequeue (remove the leading empty list)
                       œṖ        - partition at indices...
                      ¤          - nilad followed by (links) as a nilad:
               ⁹                 -   chain's right argument, N
                ‘                -   increment
                   $             -   last two links as a monad:
                 +               -     add
                  Ị              -     insignificant? (1 if 1; 0 if 2+)
                    ,2           -   pair with 2
                               } - use right argument, N, for...
                              Ɗ  - last three links as a monad:
                          C      -   compliment = 1-N
                             ḟ   -   filter discard if in...
                           Ø.    -   bits = [0,1]
                                 -     (i.e. if N=0:[0], if N=1:[1], otherwise:[0,1])
                         ż       - zip together
                                 -   e.g. [[["ジュ"],0],[["ウ","イ","チ","ガ","ツ"],1]]
                                 - implicit (smashing) print
                                 -   e.g. ジュ0ウイチガツ1

Jonathan Allan

Posted 2018-10-16T19:13:08.623

Reputation: 67 804

1

APL (Dyalog), 45 characters / 51 bytes* / 83 bytes

{S←⍵⊂⍨~⍵∊'ャュョ'⋄S[⍺~0],←'\'⋄S[0~⍨⍺≠1],←'/'⋄∊S}

Try it online!

The second score only applies if the three Katakana characters in the code can be counted as a different encoding than the other portion of the code.

Zacharý

Posted 2018-10-16T19:13:08.623

Reputation: 5 710

1

JavaScript (Node.js), 60 bytes

n=>s=>s.replace(/.[ャュョ]?/g,s=>i=--n-i?s+1:n?s:s+2,i=0)

Try it online!

tsh

Posted 2018-10-16T19:13:08.623

Reputation: 13 072

Whoa, that's a lot shorter than I expected! Nice job. – Lynn – 2018-10-17T09:36:47.173

Also, you can use other characters than \/ so you might be able to save bytes that way. – Lynn – 2018-10-17T11:02:18.890

1

Perl 6, 63 60 bytes

-3 bytes thanks to Lynn's suggestion to use other separators than / \

->\n{&{S:g[.<[ャュョ]>?]=$/~7 x!$++*?+^-n~1 x(++$==n>0)}}

Try it online!

Curried function. Uses 7 1 instead of / \.

Explanation

->\n{     # Block taking argument n
  &{      # Return a block
    S:g   # Replace globally
    [.<[ャュョ]>?]  # regex for mora
    =$/            # with match
     ~7            # followed by 7
      x!$++*?+^-n  # if i==1 and n!=1
                   # should be read as (! $++) * (? +^ - n)
                   # +^ is bitwise negation like ~ in C
                   # ? converts to Bool like !! in C
                   # so ?+^-n is like !!~-n or !!(n-1) or n!=1
     ~1            # followed by 1
      x(++$==n>0)  # if i==n and n>0
  }
}

nwellnhof

Posted 2018-10-16T19:13:08.623

Reputation: 10 037

Cool solution! How should I be reading that !$++*?+^-n exactly? – Lynn – 2018-10-17T10:58:54.507

Also, you can use other characters than \/ so you might be able to save bytes that way. – Lynn – 2018-10-17T11:01:07.067