F̲o̲r̲m̲a̲t̲t̲e̲r̲ (*Unicode* **Markdown** __Formatter__)

4

If you cannot see this post, you can use this image

Your task is to create a markdown parser that outputs Unicode. It should support , , , and ̲̲̲̲̲̲̲̲̲.

Every alphabetical character should be converted into Math Sans. This includes the characters before an underline character.

Underlines: Add a ̲ character (code point 818) after each character. Underlines can be combined with anything else.

There should be no way to escape a control character.

The following combinations should be supported:

(code point 120224 to 120275 Math Sans)

` ` (120432 to 120483 Math Monospace)

* * (120328 to 120379 Math Sans Italic)

** ** (120276 to 120327 Math Sans Bold)

*** *** (120380 to 120431 Math Sans Bold Italic)

__ ̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲ __ (818 Math Sans with Combining Low Line after each character)

__` ̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲̲ `__ (Math Sans Monospace with Combining Low Line)

Underline should also support Math Sans Italic, Math Sans Bold, and Math Sans Bold Italic

The final output should not contain the characters used to format the text.

Nesting

Some things can be nested. Underlines can nest with any character and can be put in the inside or outside of another control character. Both of these will have an underline

**Underlines __inside the bold__**
__Bold **inside the underline**__

Other things such as bold, monospace, bold italic, and italic can't nest because there are no charactes for it

**This is `not valid` input and will***never be given*

Ambiguous Formatting Characters

Formatting characters that can be ambiguous will never be given. ** is valid because it is bold, but **** will never be given because it is undefined.

At the end of formatting characters, there will always be another character between the next formatting delimiter of the same character. *format***ing** will not be given because there should be a character before the next *format* **ing** delimiter using *s, however *format*__ing__ could be given because they are different characters used in the delimiter.

**Which is bold ***and which is italic?* <-- Invalid, will never be given
**Oh I see** *now.* <-- Valid

Escaping

Nothing can escape. There is no way to use any of the control characters except for a singular underscore

*There is no way to put a * in the text, not even \* backslashes* <-- All of the asterisks will be italic

Even __newlines
don't interrupt__ formatting <-- newlines don't interrupt will all be underlined

Because the singular _underscore is not a formatting character, it_is allowed_ and will not be removed

Example I/O

Example I

*This* is_a **test** of __the ***markdown*** `parser`__ Multi-line `__should
be__ supported`. Nothing `can\` escape.

Example O

 _   ̲̲̲ ̲̲̲̲̲̲̲̲̲ ̲̲̲̲̲̲̲ - ̲̲̲̲̲̲
̲̲̲ .  \ .

This input is not valid and it doesn't matter what your program outputs:

*This **is not valid** input* because it nests italic and bold

*Neither `is this`* because it nests italic and monospace

*This***Is not valid** Either because it can be interpreted different ways

As this is , the shortest solution in bytes wins.

pfg

Posted 2018-03-12T18:04:27.147

Reputation: 735

Can you change the title so my eyes stop bleeding? – Christopher – 2018-03-12T18:09:10.427

@Christopher what is wrong with it? https://i.imgur.com/04ErkFu.png

– pfg – 2018-03-12T18:10:13.210

@pfg it doesn't format well on some devices. (particularly the underlined characters) – Rɪᴋᴇʀ – 2018-03-12T18:12:54.320

@Riker Hmm, works on my phone https://i.imgur.com/hjBVwHp.png. I've put both in the title for people who can't read it

– pfg – 2018-03-12T18:15:45.807

Very related – Shaggy – 2018-03-12T19:13:49.270

@Shaggy but also very different – pfg – 2018-03-12T19:26:30.083

Should we underline whitespace (newlines, spaces)? – Οurous – 2018-03-12T21:43:18.657

@Οurous Yes, underline everything – pfg – 2018-03-12T22:15:30.183

I guess that the input can contains A-Z, a-z, spaces, newline? – user202729 – 2018-03-13T01:19:27.807

@user202729 and all other characters like underscores and stars. If a character isn't azAZ it shouldn't be transformed – pfg – 2018-03-13T01:29:44.757

Can we assume the input contains only printable ASCII? – Erik the Outgolfer – 2018-03-13T10:18:36.440

Also, I think there's an error on the example O, a combining underline should code after the newline too, no? – Erik the Outgolfer – 2018-03-13T11:09:45.430

Yeah I fixed that – pfg – 2018-03-13T18:04:04.030

Answers

4

Stax, 75 73 74 70 bytes

Ç╜ÿPÜV►♀b7╛i┴?τ\⌂:Té√╖■♠(µ`ómƒÜKx▬∙═τ½εxÅr∩!E#î⌡╕B┴zäÅéë┘S²↔óh₧≡6ÿ╖iô├

Run and debug it online

The basic approach is store the 4 types of formatting in a bit field, and then get the correct character offset from a lookup using the integer value. Only 8 values are in the lookup. The high bit denoting underline is handled separately.

Here's a commented ungolfed version.

m                       Map mode: for each line of input, 
                            run the rest of the program, then output
"\*\*|__|."|F           Find all regex matches using pattern. This splits the string into 
                            individual characters except for ** and __, which stay together.
                            These will be called "atoms" of the input.
{                       begin a block
  VAVa+                 Construct "ABC..XYZabc..xyz"
  "_*``_****__"_I       Get the index of the current atom in "_*`_****__"
  0|MY                  Take the maximum with zero, then store in register y.
                            This gives {*: 1, `: 2, **: 4, __: 8} and 0 for all else. 
  x|^X                  Xor with the x register and write back. It stores bitflags for 4 modes.
  "&6mU*9*9 0E*9 !"!@   Index into codepoint lookup table using x.
  c52+|r                Build range of 52 consecutive values starting at codepoint.
  \$                    Build translation string by combination with the alphabetic string.
  |t                    Perform translation on the current atom.
  818+                  Add codepoint 818 to the current atom.
  x8<T                  Remove it again if x<8.  This is iff underline mode is off.
  y!*                   Multiply by the logical not of y.  If the atom is for formatting, 
                        this eliminates output for this atom
m                       map over atoms using the enclosed block

Run this one

recursive

Posted 2018-03-12T18:04:27.147

Reputation: 8 616

It seems like monospace underline is using bold italic instead of monospace – pfg – 2018-03-12T22:19:21.707

My mistake. I thought that that was a disallowed combination. I'll see about addressing that. – recursive – 2018-03-12T23:03:44.587

I guess I should make it more clear in my post. Underlines combine with anything – pfg – 2018-03-12T23:04:40.603

@pfg Hm, I guess I don't understand the requirement. What styles are combine-able with mono? There's no font that combines bold or italic with mono, so what should the behavior be? – recursive – 2018-03-12T23:16:33.970

It's the opposite, mono is combineable with underline, underline is combineable with everything else – pfg – 2018-03-12T23:17:43.420

**`__mark__down`** This is the test case we're talking about right? It has bold and mono nested. What's the correct behavior for that? – recursive – 2018-03-12T23:19:23.530

Yeah. `__mark__down` should make ̲̲̲̲ – pfg – 2018-03-12T23:21:29.400

1

Let us continue this discussion in chat.

– recursive – 2018-03-12T23:22:35.687

1@pfg That combination overflowed the lookup table. I've added a fix at a cost of 1 byte. – recursive – 2018-03-12T23:30:56.323

2

Jelly, 87 bytes

JḤ
ØṖḟØAḟØa⁷;e€@Ọ¬
=Œua6+“¡ẓƬ’a@Ç+©O3ḷr1”*ẋ“`”ṭðO®œṣ+Ça¥¥€Ñ¦Ẏ©µ"“ƈ4hẊ‘ṪỌµœṣ⁾__p€Ñ¦818Ọ¤

Try it online!

Erik the Outgolfer

Posted 2018-03-12T18:04:27.147

Reputation: 38 134

If I'm reading this right the difference is that the newline is underlined. – recursive – 2018-03-13T18:53:44.297

@recursive Yes it is, although I think it has been fixed now. – Erik the Outgolfer – 2018-03-13T18:54:11.973

Hm. It appears this "fix" has invalidated my solution. – recursive – 2018-03-13T19:27:53.760

@recursive Looks like it, but, well, that's what the challenge specification really says, not me. – Erik the Outgolfer – 2018-03-13T19:32:50.910

1

Python 3, 267 253 243 239 235 229 226 224 bytes

import re
def f(s):
 m=u=0;a='__','**','*','***','`'
 for c in re.findall(r'__|\*+|`|.|\n',s):
  if c in a:m,u=[a.index(c)-m,m,u,1-u][c=='__'::2]
  else:print([c,chr(120159+52*m+ord(c)-6*(c>'`'))][c.isalpha()]+'̲'*u,end='')

Try it online!

TFeld

Posted 2018-03-12T18:04:27.147

Reputation: 19 246