HTML2JSON Converter

5

Edit

I have posted a 250 lines reference implementation for review at CR, discussing the major class of the code. Download the complete Visual Studio 2013, .NET 4.5 source code (10 kB) from my server.

Story

This is based on a real world puzzle.

A colleague had a very old program. It was announced to be discontinued by the company and then it was really discontinued in 2009. Now I should use that program as well, but it is full of bugs and I refused to use it. Instead I wanted to help my colleague and converted the content into a reusable format so we could import it to any other program.

The data is in a database, but the scheme is horrible because it creates new tables every now and then, so doing a SQL export was not feasible. The program also does not have an export functionality. But it has a reporting capability. Bad luck: reporting to Excel fails, because the output is larger than the 65.000 lines allowed by the only old excel format it supports. The only option left was reporting to HTML.

Task

Given

  • input is a HTML document
  • it seems to be created by Microsoft Active Report, which shouldn't really matter. But for the sake of equality, your code must not use any Active Report specific library to parse the HTML (if such a thing exists)
  • The reports is hierarchically structured in either 2 or 3 levels.
  • the HTML is horrible (IMHO), it consists of DIVs and SPANs only, each having a @style attribute specified. The left position of the style at a SPAN can be used to detect the indentation, which allows conclusions to get the number of levels.
  • there is always one SPAN which defines the key. The key is formatted in bold and ends with a colon (use whatever you want to detect it)
  • The key may be followed by another SPAN which defines the value. If the value is not present, it shall be considered as an empty string.
  • if a key reappears on the same level, that's the start of a new item

you

  • create a valid JSON file as output
  • the JSON needn't be pretty-printed, just valid
  • non-breaking spaces can be converted to normal spaces or be Unicode 00A0
  • remove the NOBR tags from the output
  • ignore any unneeded information like the page number (last SPAN in a DIV), HEAD and STYLE before the first DIV
  • use the key as the property name (remove the trailing colon and the spaces)
  • use the value as the property value
  • create an array property called Children for the items on the next level

As far as I can tell, the HTML is technically ok. All tags are closed etc. but it's not XHTML. You needn't do any error checking on it.

Side note: I don't need your code to solve a task or homework. I have already implemented a converter but to a different output format.

Rules

Code Golf. The shortest code wins. Hopefully the task is complex enough to make it hard for special golfing languages so that other languages have a chance as well.

Libraries are allowed. Loophole libraries specially crafted for this task shall be downvoted.

Answer with shortest code on 2015-03-14 gets accepted.

Sample

The code shall be able to process all kinds of reports the program produces, so do not hard code any keys or values. Here's an anonymized 2 page except of one of the reports.

Screenshot of the HTML

<html>
    <head>
        <title>
            ActiveReports Document
        </title><meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8" />
    </head><body leftmargin="0" topmargin="0" marginheight="0" marginwidth="0">
        <style>
            @page{size: 11.69in 8.27in;margin-top:0.1in;margin-left:1in;margin-right:1in;margin-bottom:0in;}
        </style><div style="page-break-inside:avoid;page-break-after:always;">
            <div style="position:relative;width:9.69in;height:8.17in;">
                <span style="position:absolute;top:1.171028in;left:0.01388884in;width:1.111274in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Requirement&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:1.171028in;left:1.139051in;width:7.860949in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Activate&nbsp;Golf</nobr></span>
                <span style="position:absolute;top:1.383181in;left:0.01388884in;width:0.6756048in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Description:</nobr></span>
                <span style="position:absolute;top:1.583181in;left:0.01388884in;width:0.2805853in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Risk:</nobr></span>
                <span style="position:absolute;top:1.583181in;left:0.3083631in;width:8.691637in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Average</nobr></span>
                <span style="position:absolute;top:1.795334in;left:0.01388884in;width:0.3974066in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Status:</nobr></span>
                <span style="position:absolute;top:1.795334in;left:0.4251844in;width:8.574816in;height:0.1664768in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>In&nbsp;Progress</nobr></span>
                <span style="position:absolute;top:2.007487in;left:0.01388884in;width:0.7040471in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Priorit&#228;t&nbsp;QA:</nobr></span>
                <span style="position:absolute;top:2.007487in;left:0.7318249in;width:8.268175in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Average</nobr></span>
                <span style="position:absolute;top:2.387853in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:2.387853in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Activate&nbsp;Golf</nobr></span>
                <span style="position:absolute;top:2.768218in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:2.768218in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Activate&nbsp;Golf&nbsp;Settings</nobr></span>
                <span style="position:absolute;top:3.148584in;left:0.01388884in;width:1.111274in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Requirement&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:3.148584in;left:1.139051in;width:7.860949in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Add&nbsp;Balls</nobr></span>
                <span style="position:absolute;top:3.360737in;left:0.01388884in;width:0.6756048in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Description:</nobr></span>
                <span style="position:absolute;top:3.560737in;left:0.01388884in;width:0.2805853in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Risk:</nobr></span>
                <span style="position:absolute;top:3.560737in;left:0.3083631in;width:8.691637in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Low</nobr></span>
                <span style="position:absolute;top:3.77289in;left:0.01388884in;width:0.3974066in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Status:</nobr></span>
                <span style="position:absolute;top:3.77289in;left:0.4251844in;width:8.574816in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>In&nbsp;Progress</nobr></span>
                <span style="position:absolute;top:3.985043in;left:0.01388884in;width:0.7040471in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Priorit&#228;t&nbsp;QA:</nobr></span>
                <span style="position:absolute;top:3.985043in;left:0.7318249in;width:8.268175in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>High</nobr></span>
                <span style="position:absolute;top:4.365408in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:4.365408in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Download&nbsp;Golf&nbsp;Puzzles from Internet</nobr></span>
                <span style="position:absolute;top:4.745774in;left:0.01388884in;width:1.111274in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Requirement&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:4.745774in;left:1.139051in;width:7.860949in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Add&nbsp;Player</nobr></span>
                <span style="position:absolute;top:4.957927in;left:0.01388884in;width:0.6756048in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Description:</nobr></span>
                <span style="position:absolute;top:5.157927in;left:0.01388884in;width:0.2805853in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Risk:</nobr></span>
                <span style="position:absolute;top:5.157927in;left:0.3083631in;width:8.691637in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Average</nobr></span>
                <span style="position:absolute;top:5.37008in;left:0.01388884in;width:0.3974066in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Status:</nobr></span>
                <span style="position:absolute;top:5.37008in;left:0.4251844in;width:8.574816in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>In&nbsp;Progress</nobr></span>
                <span style="position:absolute;top:5.582232in;left:0.01388884in;width:0.7040471in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Priorit&#228;t&nbsp;QA:</nobr></span>
                <span style="position:absolute;top:5.582232in;left:0.7318249in;width:8.268175in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Average</nobr></span>
                <span style="position:absolute;top:5.962598in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:5.962598in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Golf&nbsp;Upload&nbsp;Puzzles&nbsp;to Server</nobr></span>
                <span style="position:absolute;top:6.342964in;left:0.01388884in;width:1.111274in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Requirement&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:6.342964in;left:1.139051in;width:7.860949in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>GLFD</nobr></span>
                <span style="position:absolute;top:6.555117in;left:0.01388884in;width:0.6756048in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Description:</nobr></span>
                <span style="position:absolute;top:6.755116in;left:0.01388884in;width:0.2805853in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Risk:</nobr></span>
                <span style="position:absolute;top:6.755116in;left:0.3083631in;width:8.691637in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>High</nobr></span>
                <span style="position:absolute;top:6.967269in;left:0.01388884in;width:0.3974066in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Status:</nobr></span>
                <span style="position:absolute;top:6.967269in;left:0.4251844in;width:8.574816in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>In&nbsp;Progress</nobr></span>
                <span style="position:absolute;top:7.179422in;left:0.01388884in;width:0.7040471in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Priorit&#228;t&nbsp;QA:</nobr></span>
                <span style="position:absolute;top:7.179422in;left:0.7318249in;width:8.268175in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Average</nobr></span>
                <span style="position:absolute;top:7.48389in;left:0.07638884in;width:2.548611in;height:0.6861111in;font-family:Arial;font-size:10pt;color:#000000;vertical-align:top;"><nobr>Page&nbsp;1&nbsp;of&nbsp;115</nobr></span>

            </div>
        </div><div style="page-break-inside:avoid;page-break-after:always;">
            <div style="position:relative;width:9.69in;height:8.17in;">
                <span style="position:absolute;top:0.7972222in;left:0.01388884in;width:1.111274in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Requirement&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:0.7972222in;left:1.139051in;width:7.860949in;height:0.1664768in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Count dimples</nobr></span>
                <span style="position:absolute;top:1.009375in;left:0.01388884in;width:0.6756048in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Description:</nobr></span>
                <span style="position:absolute;top:1.209375in;left:0.01388884in;width:0.2805853in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Risk:</nobr></span>
                <span style="position:absolute;top:1.209375in;left:0.3083631in;width:8.691637in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Low</nobr></span>
                <span style="position:absolute;top:1.421528in;left:0.01388884in;width:0.3974066in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Status:</nobr></span>
                <span style="position:absolute;top:1.421528in;left:0.4251844in;width:8.574816in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Failed</nobr></span>
                <span style="position:absolute;top:1.633681in;left:0.01388884in;width:0.7040471in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Priorit&#228;t&nbsp;QA:</nobr></span>
                <span style="position:absolute;top:1.633681in;left:0.7318249in;width:8.268175in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Low</nobr></span>
                <span style="position:absolute;top:2.014046in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:2.014046in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Add&nbsp;dimples</nobr></span>
                <span style="position:absolute;top:2.394412in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:2.394412in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Create&nbsp;divot</nobr></span>
                <span style="position:absolute;top:2.774778in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:2.774778in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Repair&nbsp;divot</nobr></span>
                <span style="position:absolute;top:3.155143in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:3.155143in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Modify&nbsp;divot</nobr></span>
                <span style="position:absolute;top:3.535509in;left:0.01388884in;width:1.111274in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Requirement&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:3.535509in;left:1.139051in;width:7.860949in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Use iron</nobr></span>
                <span style="position:absolute;top:3.747662in;left:0.01388884in;width:0.6756048in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Description:</nobr></span>
                <span style="position:absolute;top:3.947661in;left:0.01388884in;width:0.2805853in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Risk:</nobr></span>
                <span style="position:absolute;top:3.947661in;left:0.3083631in;width:8.691637in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>High</nobr></span>
                <span style="position:absolute;top:4.159814in;left:0.01388884in;width:0.3974066in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Status:</nobr></span>
                <span style="position:absolute;top:4.159814in;left:0.4251844in;width:8.574816in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>In&nbsp;Progress</nobr></span>
                <span style="position:absolute;top:4.371967in;left:0.01388884in;width:0.7040471in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Priorit&#228;t&nbsp;QA:</nobr></span>
                <span style="position:absolute;top:4.371967in;left:0.7318249in;width:8.268175in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Average</nobr></span>
                <span style="position:absolute;top:4.752333in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:4.752333in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Go&nbsp;fairway</nobr></span>
                <span style="position:absolute;top:5.132699in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:5.132699in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Analyze&nbsp;trajectory</nobr></span>
                <span style="position:absolute;top:5.513064in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:5.513064in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Collect ball</nobr></span>
                <span style="position:absolute;top:5.89343in;left:0.2138889in;width:0.8061111in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Test&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:5.89343in;left:1.033889in;width:7.966111in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Exchange ball</nobr></span>
                <span style="position:absolute;top:6.273796in;left:0.01388884in;width:1.111274in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Requirement&nbsp;Name:</nobr></span>
                <span style="position:absolute;top:6.273796in;left:1.139051in;width:7.860949in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>你好</nobr></span>
                <span style="position:absolute;top:6.485949in;left:0.01388884in;width:0.6756048in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Description:</nobr></span>
                <span style="position:absolute;top:6.685949in;left:0.01388884in;width:0.2805853in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Risk:</nobr></span>
                <span style="position:absolute;top:6.685949in;left:0.3083631in;width:8.691637in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>High</nobr></span>
                <span style="position:absolute;top:6.898102in;left:0.01388884in;width:0.3974066in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Status:</nobr></span>
                <span style="position:absolute;top:6.898102in;left:0.4251844in;width:8.574816in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>In&nbsp;Progress</nobr></span>
                <span style="position:absolute;top:7.110255in;left:0.01388884in;width:0.7040471in;height:0.154324in;font-family:Calibri;font-size:9pt;color:#000000;font-weight:bold;vertical-align:top;"><nobr>Priorit&#228;t&nbsp;QA:</nobr></span>
                <span style="position:absolute;top:7.110255in;left:0.7318249in;width:8.268175in;height:0.1664767in;font-family:Calibri;font-size:9pt;color:#000000;vertical-align:top;"><nobr>Average</nobr></span>
                <span style="position:absolute;top:7.48389in;left:0.07638884in;width:2.548611in;height:0.6861111in;font-family:Arial;font-size:10pt;color:#000000;vertical-align:top;"><nobr>Page&nbsp;2&nbsp;of&nbsp;115</nobr></span>

            </div>
        </div>
    </body>
</html>

Expected JSON Output (handcrafted, please correct me if I added a typo):

{
    "Children": 
    [
        {
            "RequirementName": "Activate Golf",
            "Description": "",
            "Risk": "Average",
            "Status": "In Progress",
            "PrioritätQA": "Average",
            "Children": 
            [
                {
                    "TestName": "Activate Golf"
                },
                {
                    "TestName": "Activate Golf Settings"
                }
            ]
        },
        {
            "RequirementName": "Add Balls",
            "Description": "",
            "Risk": "Low",
            "Status": "In Progress",
            "PrioritätQA": "High",
            "Children": 
            [
                {
                    "TestName": "Download Golf Puzzles from Internet"
                }
            ]
        },
        {
            "RequirementName": "Add Player",
            "Description": "",
            "Risk": "Average",
            "Status": "In Progress",
            "PrioritätQA": "Average",
            "Children": 
            [
                {
                    "TestName": "Golf Upload Puzzles to Server"
                }
            ]
        },
        {
            "RequirementName": "GLFD",
            "Description": "",
            "Risk": "High",
            "Status": "In Progress",
            "Priorität QA": "Average",
            "Children": 
            [
            ]
        },
        {
            "RequirementName": "Count dimples",
            "Description": "",
            "Risk": "Low",
            "Status": "Failed",
            "PrioritätQA": "Low",
            "Children": 
            [
                {
                    "TestName": "Add dimples"
                },
                {
                    "TestName": "Create divot"
                },
                {
                    "TestName": "Repair divot"
                },
                {
                    "TestName": "Modify divot"
                }
            ]
        },
        {
            "RequirementName": "Use iron",
            "Description": "",
            "Risk": "High",
            "Status": "In Progress",
            "PrioritätQA": "Average",
            "Children": 
            [
                {
                    "TestName": "Go fairway"
                },
                {
                    "TestName": "Analyze trajectory"
                },
                {
                    "TestName": "Collect ball"
                },
                {
                    "TestName": "Exchange ball"
                }
            ]
        },
        {
            "RequirementName": "你好",
            "Description": "",
            "Risk": "High",
            "Status": "In Progress",
            "PrioritätQA": "Average",
            "Children": 
            [
            ]
        }
    ]
}

Thomas Weller

Posted 2015-03-02T20:13:42.990

Reputation: 1 925

1You say shortest code wins (and tagged it [tag:code-golf]), but you also say "Highest voted answer on 2015-03-14 gets accepted", as if this were a [tag:popularity-contest]. – Scimonster – 2015-03-02T20:40:46.433

@Scimonster: Indeed, that's a bug, because usually the shortest golf answers are also upvoted. Updated question. – Thomas Weller – 2015-03-02T21:04:41.457

Can you provide a reference implementation so we can make sure our code works correctly? And perhaps a larger sample of input data? – FUZxxl – 2015-03-02T21:09:10.640

@FUZxxl: in the sandbox I was asked to make the sample data shorter... In addition, the input here is limited to 35000 chars, so I'd need to provide it for download somwhere else. At this time I only have the original data, which I cannot publish. How large do you want it to be? At the moment I also don't have a reference implementation, but I'll try to provide one if that's helpful. I wanted to solve the puzzle by myself anyway (even if not very golfed). – Thomas Weller – 2015-03-02T21:12:43.963

@ThomasW. I just want some different data so I don't make wrong assumptions about what the input looks like. – FUZxxl – 2015-03-02T21:58:48.327

In the sample html what indication is there that the current top level child is ending and the next one (starting with "Requirement Name") is starting? – Jerry Jeremiah – 2015-03-03T04:05:18.483

What happens when the text in any field is long enough that it overflows to the next line? – n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳ – 2015-03-03T10:27:02.963

@JerryJeremiah: it starts over with the first key (here: "Requirement Name") at the same left position (here: 0.01388884in) again. – Thomas Weller – 2015-03-03T18:56:37.410

@n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳ : Thanks for asking. (Are you familiar with Active Reports?) In reality it will insert a BR tag, even in the middle of a word. But for the puzzle, you needn't consider this, because I export the HTML with very wide columns so that this does not happen. – Thomas Weller – 2015-03-03T18:58:56.867

@ThomasW.: No, the first time hearing the name, and I'm taken aback at the HTML mess it generates. – n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳ – 2015-03-03T19:01:54.297

@FUZxxl: the reference implementation is available. See the top of the question. – Thomas Weller – 2015-03-04T23:17:31.377

Answers

1

VBScript 819 (if line endings count as 1)

VBScript is a Microsoft scripting language unrelated to VBA, VB.net or legacy VB6

For those of you that have never seen it:

VBScript is NOT designed for golfing and I am not great at golfind anyway so the best I can do is 819. If you see a way to make it shorter please comment.

To run this code use:

cscript //nologo //u html2json.vbs < html2json.html > html2json.json

This code does not indent the json properly but keeping the indent level would have made it even longer. The other thing that differs from the example json is that this code doesn't produce empty Children arrays that aren't there in the html - I hope that's ok. Here is the code:

Set W=WScript
Set S=W.Stdin
W.Echo"{"
Do
While InStr(a,":abs")<1
a=S.Readline
If a="</html>"Then W.Echo"}]}":W.Quit
Wend
Do
If InStr(a,">Page&")<1Then
a=Replace(a,"&nbsp;"," ")
Do
B=InStr(a,"&#")+2
If B<3Then Exit Do
C=InStr(B,a,";")
a=Left(a,B-3)&Chr(Mid(a,B,C-B))&Mid(a,C+1)
Loop
D=InStr(a,"<nobr>")+6
E=InStr(D,a,"</nobr>")
V=Mid(a,D,E-D)
F=InStr(a,"top:")+4
G=InStr(F,a,"in;")
T=Mid(a,F,G-F)
If T=U Then
W.Echo X&""""&K&""":"""&V&"""":X=","
K=""
Else
If K<>""Then W.Echo X&""""&K&""":""""":X=","
K=Left(V,Len(V)-1)
K=Replace(K," ","")
P=InStr(a,"left:")+5
Q=InStr(P,a,"in;")
L=Mid(a,P,Q-P)
If L>M Then W.Echo X&"""Children"":[":X=""Else If Abs(T-U)>.3And U>0Then W.Echo"}":X=","
If L<M Then W.Echo"]}":X=","
If Abs(T-U)>.3Then W.Echo X&"{":X=""
End If
End If
a=S.Readline
Loop While InStr(a,">Page ")<0
U=T
M=L
Loop

Jerry Jeremiah

Posted 2015-03-02T20:13:42.990

Reputation: 1 217

0

C# - 1716

Ok, this one was rather simple for me, but that might be the reason why it is not very short. I just golfed the reference implementation (including all parts, not only the part that is posted on Code Review):

namespace System{using Collections.Generic;using Globalization;using IO;using Linq;using ExCSS;using HtmlAgilityPack;using Newtonsoft.Json;class J:JsonConverter{Dictionary<string,string>P=new Dictionary<string,string>();List<J>C=new List<J>();decimal L;J V{get;set;}static void Main(string[]a){J j=new J();j.H(a[0]);File.WriteAllText(a[1],JsonConvert.SerializeObject(j,j));}public override void WriteJson(JsonWriter w,object o,JsonSerializer s){J j=(J)o;w.WriteStartObject();foreach(var p in j.P){w.WritePropertyName(p.Key);w.WriteValue(p.Value);}if(j.C.Count>0){w.WritePropertyName("Children");w.WriteStartArray();foreach(J c in j.C)s.Serialize(w,c);w.WriteEndArray();}w.WriteEndObject();}public override object ReadJson(JsonReader r,Type t,object o,JsonSerializer s){return o;}public override bool CanConvert(Type o){return o==typeof(J);}void H(string f){var h=new HtmlDocument();h.LoadHtml(File.ReadAllText(f));J v=this;foreach(var p in h.DocumentNode.Descendants().Where(x=>(x.Name=="div"&&x.Ancestors("div").Any()))){var k="";foreach(var s in p.Descendants().Where(x=>x.Name=="span")){var d=new Parser().Parse(".d{"+s.Attributes["style"].Value+"}").StyleRules[0].Declarations;var t=Net.WebUtility.HtmlDecode(s.Descendants("nobr").First().InnerText);if(!d.Any(a=>a.Name=="font-weight"&&a.Term+""=="bold")){A(ref v,k,t);k="";}else{A(ref v,k,"");k=t.Trim(':');foreach(var l in from a in d where a.Name=="left"select decimal.Parse(a.Term.ToString().Replace("in",""),new NumberFormatInfo{NumberDecimalSeparator="."})){while(v.L>l)v=v.V;if(l>v.L){J c=new J{L=l,V=v};v.C.Add(c);v=c;}}}}}}void A(ref J j,string k,string v){if(k=="")return;if(j.P.ContainsKey(k)){J s=new J{L=j.L,V=j.V};j.V.C.Add(s);j=s;}j.P.Add(k,v);}}}

Thomas Weller

Posted 2015-03-02T20:13:42.990

Reputation: 1 925