Check my program!

-6

0

Challenge

Make an error-checker in the least amount of possible bytes! If you received an HTML program as a string through standard input, your program will be able to find all the errors and return to standard output. If the program is over 2000 characters, return that as an error too.

Errors

As there has been discrepancy, I will provide the errors you can check for.

Regular:

  • File too long
  • Tag not closed
  • Non-empty element not closed
    • XHTML self-closing will not give you any bonus.
  • Entity not closed (i.e. &amp vs &)
  • CSS rule with no selector attached (i.e. {color:white;} vs span {color:white;}
  • CSS unclosed property-value pair (i.e. color:white vs color:white;)
  • Errors with brackets (e.g. no closing bracket, rules outside brackets, etc.)
    • Each different rule about brackets counts separately.
  • Undefined tag (worth x10)
  • Undefined attribute (worth x4)
  • Undefined CSS property (worth x10)

Let me make this clear:

You are not expected to complete all or even most of these.

Completing 7 of them would be incredible, but not necessary. Please though, try as many as you can while keeping the byte count relatively low!

Scoring

  • -10 bytes for every type of SyntaxError checked (e.g. undefined tag, oversized file [see above], etc.)
  • -30 bytes for every type of SyntaxError checked that returns a customized error message and line number. (for a total of -40 bytes per error)
    • You may not have a customized message less than 20char.
  • -100 bytes if your program can check embedded CSS in the document for at least 3 types of errors. (this includes <element style=""> and <style></style>.
  • +100 bytes if you check only 3 or less types of errors.
  • ~0 bytes if your final score is negative.

As this is my first code-golf challenge, please comment if you think the scores are out of range.

Rules

  • You may not use a built-in function to check for errors.
  • You must check for at least 1 error type. No blank answers! :D
  • You must return at minimum the number of errors in the document.
  • When you post your score, you should use this Markdown format:
    Output:

    My Language, Unbonused bytes Final Score

    code here
    

    Solves:

    • First Error
    • Second Error
    • Et Cetera

    Why/How it works

       Markdown:

# My Language, <s>Unbonused bytes</s> Final Score #
    code here
**Solves:**
* First Error
* Second Error
* Et Cetera  
Why/How it works

* You may add additional strikethroughed numbers if you wish in between the two values.

Winning

After one week, the answer with the smallest number of bytes will win. If there is a tie, the answer with the most upvotes will win. If there is still a tie, I hold the divine power to choose. The winning answer will be accepted.

After the deadline, others can still post their answers, but no reward will be given.

Good Luck!

This is my first challenge, so if anything is lacking in any way, please comment. Thank you!

OldBunny2800

Posted 2015-11-27T17:46:25.790

Reputation: 1 379

Question was closed 2015-11-27T23:27:27.807

10-1 too many bonuses. Scoring should be simple. – Mego – 2015-11-27T17:51:51.980

3You should add some example HTML programs for testing. – LegionMammal978 – 2015-11-27T17:56:50.473

2@Mego To me, this seems like a complicated challenge, so IMO there should be bonuses to bring the bytes back down. Also, there are literally 3 different bonuses: You dock 10 bytes if you check an error, and an additional 15 for customizing it. You dock 100 if you check CSS too. You add on 100 for cheaping out. If your score is negative, you go back to 0. Simple. – OldBunny2800 – 2015-11-27T17:57:01.173

@LegionMammal978 I will, but not yet. I want to wait a while, maybe at least a day. EDIT: My reasons for that? Not saying nuthin'. – OldBunny2800 – 2015-11-27T17:57:49.363

Also, are we allowed to check for errors that merely cause XHTML non-compliance (such as <br> instead of <br/>)? – LegionMammal978 – 2015-11-27T18:00:41.753

This is specifically HTML, so I would say no. You could do it, but I just wouldn't give you any bonus for it. – OldBunny2800 – 2015-11-27T18:05:11.967

3Considering that most challenges are scored by the byte count of the solution (without any modifications), this is anything but simple. – Mego – 2015-11-27T18:06:06.883

7Have you tried to solve the task by yourself? Writing a full DOM-parser is not an easy challenge. There are many tags, many attributes many levels and ways of nesting. This can very fast become huge and complex. – insertusernamehere – 2015-11-27T18:22:33.570

1I never said it was easy. >:) – OldBunny2800 – 2015-11-27T18:24:12.347

If we aren't testing for XHTML, what should we test for? Different browsers allow different errors. – LegionMammal978 – 2015-11-27T18:36:46.347

1Any errors that are errors in HTML (E.G. not closing tags that should be closed, unclosed strings, etc.) as well as the 2000char limit I used. – OldBunny2800 – 2015-11-27T18:39:44.340

...you just restated the XHTML standard. – LegionMammal978 – 2015-11-27T18:49:18.750

What I'm saying is you don't need to do things that are only in XHTML, such as closing empty elements (i.e. <img src="" alt="" / >), uppercase tags (i.e. <IMG> not <img>) etc. – OldBunny2800 – 2015-11-27T19:27:01.790

Under what curcumstance does the custom error method with line number bonus apply to size checking. – pppery – 2015-11-27T20:43:01.287

You don't to provide a non-existent line number. Just give a custom error message that's intuitive (e.g. not just "l") It should be at least 20char long. Adding that to OP now. – OldBunny2800 – 2015-11-27T20:54:09.463

7I'm voting to close as "Unclear what you're asking" because "every type of SyntaxError checked" is far too vague for a spec. If you gave 10 programmers a copy of e.g. the HTML 4.0 spec and asked them how many possible syntax errors there are, you would get at least 10 different answers. – Peter Taylor – 2015-11-27T21:35:20.800

@Peter I don't think you understood what I meant. I used "every" to mean "each", not "all". – OldBunny2800 – 2015-11-27T21:40:42.993

Yes, I understood that. My point is that the number of errors checked by a serious error-checking program is highly debatable. I might claim that X, Y, and Z are three different types of errors, whereas someone else might claim that they're really all the same error. – Peter Taylor – 2015-11-27T21:45:13.217

4So really, you want us to write a DOM parser that can handle malformed input by definition and unambiguously mark where it went wrong? I'm sorry, but I want to be paid by the hour before going there. – Sanchises – 2015-11-27T22:53:44.440

The "specific error message" bonus is never worth obtaining, because it is only worth 15 bytes and the error message needs to be at least 20 bytes. – pppery – 2015-11-27T22:57:38.213

Oh :$ sorry. Changing that now. – OldBunny2800 – 2015-11-27T23:20:53.263

@PeterTaylor,@sanchises You don't need to comprehensively solve every single error. Keep it simple! Strategy: Do 4 different types so you don't get the penalty, but it is still keeping the bytes relatively low. – OldBunny2800 – 2015-11-27T23:26:09.600

I have clarified the allowed errors, comment if I left any out. – OldBunny2800 – 2015-12-01T05:26:47.437

I'm surprised that in the time since this was posted nobody asked about Parsing the HTML with Regex. – NoOneIsHere – 2016-07-23T02:57:36.670

Answers

4

Unefunge 98, 15 9 - 10(1 error type) 36 bytes - 40(one error with detatiled message) + 100 (only one error type) = 105 99 96

This code only checks the file size.

'Ϩ#@k~"elif rellams a edivorP<NAK>"k,1q

Note that the code above contains a unicode number with code point 2000 (counted as two bytes) and an unprintable character with character code 21 (negative ackoledgement) represented by <NAK>. The error message produces is "Provide a smaller file".

Old 15-byte version:

2aaa***#@k~1q

How it works:

The programs 2aaa*** (in the old version) and push the number 2000 on to the stack. Then the # character jumps over the @ and k extecutes ~, getting a char from user input, 2000 times. If eof is read, ~ reflects the program onto the @ terminating with a zero exit code. If no eof is encountered, 1q exits with exit code 1, indicating that there is one error in the input (its too big).

pppery

Posted 2015-11-27T17:46:25.790

Reputation: 3 987

Good! Impressed with the way you dealed with the simplicity penalty! – OldBunny2800 – 2015-11-27T20:39:14.407

@OldBunny2800 What do you mean by "Impressed with the way you dealed with the simplicity penalty"? That penalty really skyrockets my score. – pppery – 2015-11-27T20:39:47.703

I mean how you still got less than 100 even with a +100 penalty. – OldBunny2800 – 2015-11-27T20:51:17.573

@OldBunny2800 Although every answer gets at least -10 in bonuses (they have to handle at least one error), so the peanlty is really -90. – pppery – 2015-11-27T21:27:30.367

Good point, but still. – OldBunny2800 – 2015-11-27T21:28:08.743