25
2
Write a program or function that, given a string, will strip it of zalgo, if any exists.
Zalgo
For this post, zalgo is defined as any character from the following Unicode ranges:
- Combining Diacritical Marks (0300–036F)
- Combining Diacritical Marks Extended (1AB0–1AFF)
- Combining Diacritical Marks Supplement (1DC0–1DFF)
- Combining Diacritical Marks for Symbols (20D0–20FF)
- Combining Half Marks (FE20–FE2F)
https://en.wikipedia.org/wiki/Combining_character#Unicode_ranges
Input
- May be passed via command line arguments, STDIN, or any other standard method of input supported by your language
- Will be a string that may or may not contain zalgo or other non-ASCII characters
Output
Output should be a string that does not contain any zalgo.
Test Cases
Input -> Output
HE̸͚ͦ ̓C͉Õ̗͕M͙͌͆E̋̃ͥT̠͕͌H̤̯͛ -> HE COMETH
C͉̊od̓e͔͝ ̆G̀̑ͧo͜l͔̯͊f͉͍ -> Code Golf
aaaͧͩa͕̰ȃ̘͕aa̚͢͝aa͗̿͢ -> aaaaaaaaa
ññ -> ñn
⚡⃤ -> ⚡
Scoring
As this is code-golf, shortest answer in bytes wins.
3Is the string guaranteed to only contain ASCII and/or Zalgo? Or may it contain other unicode? – James – 2017-05-06T19:00:06.903
4What about legitimate uses of those characters? Zalgo is pretty much only when those characters stack with each other in a way that was never intended. – Draco18s no longer trusts SE – 2017-05-06T19:16:08.777
@DJMcMayhem The input string may have other non-ASCII characters that must not be removed. – totallyhuman – 2017-05-06T19:16:16.293
@Draco18s Any character in those Unicode ranges must be removed. Besides, I don't think golfing code that recognizes valid words with combining characters would be fun. – totallyhuman – 2017-05-06T19:19:21.513
Is an encoding mandated or can any encoding be used? – Doorknob – 2017-05-06T19:21:35.433
@Doorknob Any encoding can be used but the definition of zalgo for this question still stands. – totallyhuman – 2017-05-06T19:24:37.360
1@totallyhuman I was thinking a more generic approach: only stripping if more than one occurs after a "standard" character. That is
a͕
is fine buta͕̰
gets stripped toa
. (Also now, thanks to the emoji detector, I want to put diacritics on emoji...̘͕̑ pfft, that looks silly) – Draco18s no longer trusts SE – 2017-05-06T19:25:12.440@Draco18s That... might actually be a good idea but isn't it too late? Won't I be disrupting current progress? – totallyhuman – 2017-05-06T19:27:41.013
No idea, honestly. I don't have a good idea of how things work around here. If this is deemed a good challenge, then my idea might make a good second challenge. But it's why we have a sandbox.
– Draco18s no longer trusts SE – 2017-05-06T19:35:35.097I did put it in the sandbox but I got different questions there. – totallyhuman – 2017-05-06T19:36:29.667
Then this question is probably fine as is. :) – Draco18s no longer trusts SE – 2017-05-06T19:40:52.600
Related. – Martin Ender – 2017-05-06T19:53:40.153
2You should add some test cases with non-ASCII output. – xnor – 2017-05-06T22:02:33.803
I would be grateful if somebody could do that as I am unable to do so for a while. (Preferably with the same length as the others because that just works. :P) – totallyhuman – 2017-05-06T22:07:58.313