28
1
Task
Given a UTF-8 string (by any means) answer (by any means) an equivalent list where every element is the number of bytes used to encode the corresponding input character.
Examples
!
→ 1
Ciao
→ 1 1 1 1
tʃaʊ
→ 1 2 1 2
Adám
→ 1 1 2 1
ĉaŭ
→ 2 1 2
(single characters)
ĉaŭ
→ 1 2 1 1 2
(uses combining overlays)
チャオ
→ 3 3 3
(empty input) →
(empty output)
!±≡
→ 1 2 3 4
� (a null byte) → 1
Null bytes
If the only way to keep reading input beyond null bytes is by knowing the total byte count, you may get the byte count by any means (even user input).
If your language cannot handle null bytes at all, you may assume the input does not contain nulls.
1If the input is empty can we output 0 or another falsey value? – Alex A. – 2016-06-23T16:30:42.657
@AlexA. No, that would prevent stringing together multiple results, and I already gave the spec for empty input. – Adám – 2016-06-23T16:49:37.483
That's fine but I don't get what you mean regarding stringing together results. – Alex A. – 2016-06-23T16:51:02.957
1@AlexA. Let's say we are receiving and counting multiple inputs, and each input gets run through the byte counter. The byte counts are appended to a result file. A non-empty answer to empty input would cause input and result file to get out of sync length-wise. – Adám – 2016-06-23T16:55:08.760
2Can I print the byte counts without separation? The highest possible value is 6, so it's unambiguous. – Dennis – 2016-06-23T18:28:27.523
1@Dennis Yes, that's fine. – Adám – 2016-06-23T18:29:43.160
You know what's amazing? Copying the two
ĉaŭ
test cases out of this question works and preserves the combining characters on the second one, even though they produce identical glyphs. – cat – 2016-06-23T19:19:43.017@Adám I wish that had been added to the question in the first place, that will quite shorten some implementations – cat – 2016-06-23T19:22:53.317
@cat What had been added? – Adám – 2016-06-23T19:29:56.853
3Do we have to support null bytes? Those can be a real pain in some languages... – Dennis – 2016-06-23T20:10:03.040
@Dennis Yes, but feel free to include the shorter version that doesn't. – Adám – 2016-06-23T21:06:02.393
3You should add that to the post. I don't know most of the languages well enough to tell if it makes a difference, but I think it invalidates at least two of the answers. – Dennis – 2016-06-23T21:31:15.527
@Dennis I tried, but feel free to edit if you can make it better. – Adám – 2016-06-23T21:56:59.827
My language doesn't see a difference between a NUL byte and the end of a string. Can I request that the length of the string be given as a parameter? – cat – 2016-06-24T11:32:13.773
@cat That won't help you know where the null bytes are. See edit. – Adám – 2016-06-24T14:49:31.150
2@Adám yes it will. In C, for example, C strings end with a NUL byte, so you stop reading as soon as you find one. If you know the length of the string, you stop reading after that many bytes, NUL and all. – cat – 2016-06-24T14:56:22.553
1@cat Ah, ok, I'll add that you can get the byte count if so. – Adám – 2016-06-24T14:57:45.110
How strict are you on the output? Can the byte values be separated by newlines or do they have to be spaces? – JAL – 2016-06-27T22:13:13.387
1@JAL OP: by any means. Dennis: Can I print the byte counts without separation? The highest possible value is 6, so it's unambiguous. Adám: Yes, that's fine. – Adám – 2016-06-28T05:32:02.757