37
1
Write a program or function which receives as input a string representing a Welsh word (UTF-8 unless otherwise specified by you).
The following are all single letters in Welsh:
a, b, c, ch, d, dd, e, f, ff, g, ng, h, i, j, l, ll, m, n, o, p, ph, r, rh, s, t, th, u, w, y
To quote Wikipedia,
While the digraphs ch, dd, ff, ng, ll, ph, rh, th are each written with two symbols, they are all considered to be single letters. This means, for example that Llanelli (a town in South Wales) is considered to have only six letters in Welsh, compared to eight letters in English.
These letters also exist in Welsh, though they are restricted to technical vocabulary borrowed from other languages:
k, q, v, x, z
Letters with diacritics are not regarded as separate letters, but your function must accept them and be able to count them. Possible such letters are:
â, ê, î, ô, û, ŷ, ŵ, á, é, í, ó, ú, ý, ẃ, ä, ë, ï, ö, ü, ÿ, ẅ, à, è, ì, ò, ù, ẁ
(This means that ASCII is not an acceptable input encoding, as it cannot encode these characters.)
Notes:
- This is code golf.
- You do not have to account for words like llongyfarch, in which the ng is not a digraph, but two separate letters. This word has nine letters, but you can miscount it as eight. (If you can account for such words, that's kind of awesome, but outside the scope of this challenge.)
- The input is guaranteed to have no whitespace (unless you prefer it with a single trailing newline (or something more esoteric), in which case that can be provided). There will certainly be no internal whitespace.
Test cases:
- Llandudno, 8
- Llanelli, 6
- Rhyl, 3
- Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch, 50 (really 51, but we'll count 50)
- Tŷr, 3
- Cymru, 5
- Glyndŵr, 7
4Can the input be given in all lowercase? – ETHproductions – 2016-09-12T17:11:11.467
15My wife who is a native Welsh speaker would recommend that the J is added into the "Borrowed" letters section as it isn't actually part of the Welsh alphabet – Rich Starkie – 2016-09-12T20:29:06.707
@RichStarkie The Wikipedia article was a little vague on that front. My understanding is that j is used in borrowed words even when it's not present in the original word, so it's used phonologically, which implies that at this stage it's natualized into the language. I've seen similar arguments about v in Irish. It's widely considered not to be part of the Irish alphabet, but it exists in some Irish names, such as Ó Cuiv. – TRiG – 2016-09-12T21:06:58.603
What is it with "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch" that makes it 51? – Erik the Outgolfer – 2016-09-13T13:02:22.723
@EriktheGolfer I think it's an ng which crosses a morpheme boundary, making it two separate letters, not a digraph. – TRiG – 2016-09-13T13:09:01.590
@TRiG
yngy
is four letters? – Erik the Outgolfer – 2016-09-13T13:10:27.887@EriktheGolfer. Probably, but for the purpose of this question we'll call it three. – TRiG – 2016-09-13T13:25:16.457
I seem to recall from my Welsh lessons that
nh
andngh
are single letters, too. As in "fy nhadau" and "yng Nghymru". – megaflop – 2016-09-13T15:06:41.787@daiscog. I'm just relying on Wikipedia's article on Welsh orthography. That said, https://en.wikipedia.org/wiki/Nh_(digraph)#Welsh does list nh as a Welsh digraph, even if the other article doesn't. Interesting. Too late to change the question at this stage, though.
– TRiG – 2016-09-13T15:10:43.0271
And a footnote in the Welsh orthography article lists mh, nh, and ngh as graphems. Methinks I need to open a question on Linguistics SE.
– TRiG – 2016-09-13T15:13:19.8303Shame it's too late; that triple-glyphed "ngh" might have made it a little more complicated. – megaflop – 2016-09-13T15:15:59.127
@Rich Starkie, why does your wife not leave her own comment? Also you don't happen to be called Ringo do you? – Octopus – 2016-09-13T18:09:01.620