Japanese pitch accent numbers


Your challenge is to turn a Japanese word and a dictionary pitch accent number into a new string where the rises and falls in pitch are marked: e.g. (2, ウシロ)ウ/シ\ロ.

To help you out with this, I'll explain a little about Japanese phonology.

Background: on moras

For the purpose of this challenge, we will write Japanese words in katakana, the simplest Japanese syllabary. A word consists of one or more moras, each of which consists of

  1. one of:

  2. optionally followed by one of:


For example, シャッキン consists of 4 moras: シャ, ッ, キ, ン.

※ The three optional characters are small versions of katakana ヤユヨ (ya yu yo).
  For example, キャ is ki + small ya, pronounced kya (1 mora), whereas キヤ is kiya (2 moras).

Background: pitch accents

Japanese words are described by a certain contour of high and low pitches. For example, ハナス (hanasu, speak) is pronounced

  /  \
ハ    ス–···

meaning the pitch goes up after ハ ha, then falls after ナ na, and stays low after ス su.
(That is to say: Unaccented grammar particles after su will be low again).

We might describe this contour as “low-high-low-(low…)”, or LHL(L) for short.

In a dictionary, the pitch of this word would be marked as 2, because the pitch falls after the second mora. There are a few possible pitch patterns that occur in Tōkyō dialect Japanese, and they are all given a number:

  • 0, which represents LHHH…HH(H). This is heibangata, monotone form.

       イ–バ–ン–···                ···
      /                  or       /
    ヘ                          ナ
  • 1, which represents HLLL…LL(H). This is atamadakagata, “head-high” form.

    イ                          キ
      \                  or       \
       ノ–チ–···                   ···
  • n ≥ 2, which represents LHHH… (length n) followed by a drop to L…L(L).
    For example, センチメエトル has pitch accent number 4:

      /       \
    セ         エ–ト–ル–···

    Such a word must have ≥ n moras, of course.

※ Note the difference between ハシ [0] and ハシ [2]:

   シ–···                シ
  /            vs       /  \
ハ                    ハ     ···

You can hear the different pitch accent numbers demonstrated here ♪.

The challenge

Given an integer n ≥ 0 (a pitch accent number as above) and a valid word of at least n moras, insert / and \ into it at points where those symbols would occur in the diagrams above; i.e. at points in the word where the pitch rises or falls.

The output for (2, ハナス) would be:


And the output for (0, ヘイバン) would be:


Remember: you must correctly handle sequences like キョ or チャ as a single mora.
The output for (1, キョウ) is キョ\ウ, not キ\ョウ.

Instead of / and \, you may pick any other pair of distinct, non-katakana Unicode symbols.

This is : the objective is to write the shortest solution, measured in bytes.

Test cases

One per line, in the format n word → expected_output:

0 ナ → ナ/
0 コドモ → コ/ドモ
0 ワタシ → ワ/タシ
0 ガッコウ → ガ/ッコウ
1 キ → キ\
1 キヤ → キ\ヤ
1 キャ → キャ\
1 ジショ → ジ\ショ
1 チュウゴク → チュ\ウゴク
1 ナニ → ナ\ニ
1 シュイ → シュ\イ
1 キョウ → キョ\ウ
1 キャンバス → キャ\ンバス
2 キヨカ → キ/ヨ\カ
2 キョカ → キョ/カ\
2 ココロ → コ/コ\ロ
2 ジテンシャ → ジ/テ\ンシャ
3 センセイ → セ/ンセ\イ
3 タクサン → タ/クサ\ン
4 アタラシイ → ア/タラシ\イ
4 オトウト → オ/トウト\
6 ジュウイチガツ → ジュ/ウイチガツ\

Generated by this reference solution.


So there's nothing like 8 ジュウイチガツ? – l4m2 – 2018-10-19T00:02:18.947

@l4m2 That's correct. ジュウイチガツ only has 6 moras, so there can't be a pitch drop "after the 8th mora of the word." – Lynn – 2018-10-19T12:26:29.743



Retina 0.8.2, 68 bytes (UTF-8)



Try it online! Link includes test cases. Explanation:


Convert the number to unary, wrap it in /\, and insert it after the first mora.


If the number was 1, then delete it and the /, leaving the \.


Subtract 1 from the number, but in the case of 0, this deletes the \, leaving the /.


Move the \ one mora to the right for each remaining 1.


Jelly, 32 bytes


A full program accepting a list of characters and a non-negative integer which prints using 0 for / and 1 for \.

Try it online! Or see the test-suite.


O%62ḟ€“579‘œṗ⁸Ḋ⁹‘+Ị$,2¤œṖżCØ.ḟƊ} - Main Link: list of characters S, integer N
O                                - convert to ordinals
 %62                             - modulo by 62
      “579‘                      - code-page indices list = [53,55,57]
                                 -   (representing the three optional characters)
     €                           - for each ordinal:
    ḟ                            -   filter discard if in [53,55,57]
             ⁸                   - chain's left argument, S
           œṗ                    - partition before truthy indices
                                 -   (so before the non optional character positions)
              Ḋ                  - dequeue (remove the leading empty list)
                       œṖ        - partition at indices...
                      ¤          - nilad followed by (links) as a nilad:
               ⁹                 -   chain's right argument, N
                ‘                -   increment
                   $             -   last two links as a monad:
                 +               -     add
                  Ị              -     insignificant? (1 if 1; 0 if 2+)
                    ,2           -   pair with 2
                               } - use right argument, N, for...
                              Ɗ  - last three links as a monad:
                          C      -   compliment = 1-N
                             ḟ   -   filter discard if in...
                           Ø.    -   bits = [0,1]
                                 -     (i.e. if N=0:[0], if N=1:[1], otherwise:[0,1])
                         ż       - zip together
                                 -   e.g. [[["ジュ"],0],[["ウ","イ","チ","ガ","ツ"],1]]
                                 - implicit (smashing) print
                                 -   e.g. ジュ0ウイチガツ1

APL (Dyalog), 45 characters / 51 bytes* / 83 bytes


Try it online!

The second score only applies if the three Katakana characters in the code can be counted as a different encoding than the other portion of the code.


JavaScript (Node.js), 60 bytes


Try it online!


Whoa, that's a lot shorter than I expected! Nice job. – Lynn – 2018-10-17T09:36:47.173

Also, you can use other characters than \/ so you might be able to save bytes that way. – Lynn – 2018-10-17T11:02:18.890


Perl 6, 63 60 bytes

-3 bytes thanks to Lynn's suggestion to use other separators than / \

->\n{&{S:g[.<[ャュョ]>?]=$/~7 x!$++*?+^-n~1 x(++$==n>0)}}

Try it online!

Curried function. Uses 7 1 instead of / \.


->\n{     # Block taking argument n
  &{      # Return a block
    S:g   # Replace globally
    [.<[ャュョ]>?]  # regex for mora
    =$/            # with match
     ~7            # followed by 7
      x!$++*?+^-n  # if i==1 and n!=1
                   # should be read as (! $++) * (? +^ - n)
                   # +^ is bitwise negation like ~ in C
                   # ? converts to Bool like !! in C
                   # so ?+^-n is like !!~-n or !!(n-1) or n!=1
     ~1            # followed by 1
      x(++$==n>0)  # if i==n and n>0


Cool solution! How should I be reading that !$++*?+^-n exactly? – Lynn – 2018-10-17T10:58:54.507

Also, you can use other characters than \/ so you might be able to save bytes that way. – Lynn – 2018-10-17T11:01:07.067