Close your tags!

13

This is based off a previous deleted challenge of mine with the same name

Introduction

You are tasked with writing a program that returns a truthy or falsey value based on if the input has all its XML-like1 tags appropriately opened and closed and in the right order. Consider the following as input:

<Apple>

This would return a falsey value because the tag is not closed correctly. This:

<Apple></Apple>

On the contrary returns a truthy value because it is closed correctly. The program should also check nested tags to make sure they are in the correct position. For example, take this as input:

<mango><Apple></mango></Apple>

All the tags are closed correctly, but not in the correct order. Your program must check for correct tag hierarchy and nesting.

Definitions

Let me define a few things before I get into the rules and assumptions.

Tag

A basic XML-style tag. For example: <Apple>. They can have at most, one leading and trailing space (or else it's invalid and falsey), so < Apple > and <Apple> are the same. These tags can also contain attributes like foo="bar" (with required double quotes, or else invalid and falsey), and the attribute name can only contain any alphanumeric character or _, :, -, and .. The attribute names also do not require an attribute value, and values can contain anything except " before the closing double quote. The closing tag must not contain attributes, and no tags should have newlines in them.

Tag Name

Tag names are the tags' names. For example, <Apple>'s tag name is Apple. Tag names can contain the same characters as attribute names, and are case-sensitive. This means <Apple> is not <apple>.

Self-Closing Tag

A regular tag that closes itself such as <Apple /> or <Apple/> (they are the same). The space between the slash and the tag name is allowed.

Plain Text

A string of characters that can contain anything and are not enclosed in < and >.

"Simple" Tag

Either an opening, closing, or self-closing tag.

Rules

  • Output may be returned or printed, and input may be taken any way you like
  • Input is a string, consisting of either tags, plain text, or both
  • Your program can be a function or a whole working program

  • Plain text can be anywhere; if the input consists only of plain text, the program should return a truthy value.

  • Recognition of nested tags is required for the program. If a tag is nested in a tag, that nested tag must be closed before the parent is closed, just like regular XML, or else a falsey value should be returned

Assumptions

  • You can assume that input will always be one or more "simple" tag(s)
  • You can assume that input will always follow the format for tags defined above

Test Cases

Falsey

<apple>

<apple></Apple>

<apple></mango>

<apple><mango>

<a><b></a></b>

Text<ul><li></li><ul />

<pear attr=foo></pear attr=foo>

<Ketchup flavor=spicy></Ketchup>

<Ap ple></Apple>

Truthy

Text 

<Apple />

<Apple></Apple>

< Apple ></ Apple>

<mango><Apple/></mango>

<mango>Text<div class="bar">More text \o/</div></mango>

<food group="fruit">Fruits:<orange :fruit-variety="clementine" /><pear _fruit.type="asian" /></food>

<example foo="abcdefghijklmnopqrstuvwxyz1234567890-/:;()$&@.,?!'" noValue>Any characters allowed! (0.0)</example>

Scoring

This is , so the shortest code in bytes wins. Standard loopholes are prohibited as usual.


1Note: This is not real XML, but a pseudo-XML with different rules for the challenge. Tag and attribute names differ from specification.

Andrew Li

Posted 2016-12-30T15:27:37.973

Reputation: 1 061

If a tag has more than one space before or after it, do we have to mark it false? – JayDepp – 2016-12-30T16:45:53.807

@JayDepp Yes - let me clarify that in my post – Andrew Li – 2016-12-30T16:47:26.640

Can we use builtins that parse strings to XML? – Oliver – 2016-12-30T17:05:37.250

@obarakon The problem is this isn't necessarily valid XML. See the footnote. – Andrew Li – 2016-12-30T17:10:06.603

Is it correct to say, that this is a truthy input: < : : :><:/><: :=":=:" ::></:>< /:>? – insertusernamehere – 2016-12-30T18:02:33.287

@insertusernamehere Yes, that is truthy. – Andrew Li – 2016-12-30T18:06:52.707

The attribute names also do not require an attribute value ? Example please? – edc65 – 2016-12-30T20:04:51.400

@edc65 <example attrWithoutVal></example> just like <input required>. – Andrew Li – 2016-12-30T20:05:22.250

Answers

2

Retina, 76 74 Bytes

+`< ?([-.:\w]+)( ?[-.:\w]+(="[^"]*")?)* ?(/>|>[^<>]*< ?/ ?\1 ?>)

^[^<>]*$

Since I've seen that retina is really good for golfing regexes, I figured I'd try it out. Follows the same logic as my Ruby answer and prints 0 or 1.

Try it online!

JayDepp

Posted 2016-12-30T15:27:37.973

Reputation: 273

1You don't need the `M``. If the final stage only has a single part, match mode is implied. – Martin Ender – 2016-12-31T09:24:52.410

1

Ruby (2.3.1), 103 101 100 Bytes

->s{s.sub!(/< ?([-.:\w]+)( ?[-.:\w]+(="[^"]*")?)* ?(\/>|>[^<>]*< ?\/ ?\1 ?>)/,'')&&redo;!(s=~/<|>/)}

Anonymous function called by appending .call("<Apple></Apple>"). Substitutes matching or self closing tags until there arent any, and then returns whether the string has no angle brackets remaining.

Try it online!

JayDepp

Posted 2016-12-30T15:27:37.973

Reputation: 273

This marks <p title="This is a \"test\"."></p> as Falsey, but it shouldn't be. – orlp – 2016-12-30T21:03:14.550

@orlp 'values can contain anything except " before the closing double quote.' – JayDepp – 2016-12-30T21:05:40.163

Oh it's not real XML... – orlp – 2016-12-30T21:07:40.643

1Real XML should never be parsed with regex :) – JayDepp – 2016-12-30T21:12:10.247