Convert this character based table into html one

7

Take a following character table:

-------------------------------------------------------
|       |1233456 | abc    | xyz    |        |         |
|-------|--------|--------|--------|--------|---------|
|       |abcdefgh| 1234567|        | 12345  |         |
|   abc |xyzabcqw|        |        |        |         |
|       |zzzz    |        |        |        |         |
|-------|--------|--------|--------|--------|---------|

It can have a variable number of columns or rows. Column width and row height can vary. Cells can be empty or have text, but the row height is optimized to wrap the text. So, second row has three lines because second cell content in second row is three lines high.

Input is ASCII text as shown in example. Cells can not contain | or - characters. Cell content should be escaped, but no need to worry about rowspan and colspan, since the table structure is always simple (every row same number of columns and every column same number of rows).

Output should be well-formed xhtml. Html table should reflect the table structure, content and line breaks. So, resulting html based on the sample input above is the following:

<table>
<tr>
    <td></td>
    <td>1233456</td>
    <td>abc</td>
    <td>xyz</td>
    <td></td>
    <td></td>
</tr>
<tr>
    <td>abc</td>
    <td>abcdefgh<br/>xyzabcqw<br/>zzzz</td>
    <td>1234567</td>
    <td></td>
    <td>12345</td>
    <td></td>
</tr>
</table>

Shortest code which takes this table and turns it into an xhtml table wins.

Dan

Posted 2013-06-11T01:59:31.287

Reputation: 171

1Please add some specification to your challenge. What are allowed input formats? Are all columns of even size (as your example indicates)? Why are some contents padded by spaces/empty lines? What features do we need to reflect also in the html table? Column widths? Padding? Text wrapping? Do we need to output well-formed html? – Howard – 2013-06-11T04:07:32.437

@Howard hope the last edit will help. – Dan – 2013-06-11T04:53:47.883

1Are colspan and rowspan support required? Are the cell contents expected to be escaped, or do we need to escape them? – Peter Taylor – 2013-06-11T06:59:45.540

Do you you require the output to be XHTML-compliant (every tag is closed or self-closing), as the output example suggests, or HTML 4.0 Transitional (most closing tags are not required) will suffice? – John Dvorak – 2013-06-11T07:09:42.840

@PeterTaylor I edited the text to respond your question, in short: no need for row and colspan, but should escape content. – Dan – 2013-06-11T14:39:13.517

@JanDvorak should be valid xhtml. – Dan – 2013-06-11T14:39:35.330

You say XHTML, but then your "HTML table" link points to the HTML 4.0 spec. Which one is it, then? – John Dvorak – 2013-06-11T16:27:59.203

@JanDvorak xhtml. Just wondering: what is the difference between the two except that in xhtml you have to close the tags properly? – Dan – 2013-06-11T16:42:28.930

@Dan XHTML is valid XML while HTML is valid SGML. Apart from the need to close every tag, the differences include a different doctype and, more importantly, the handling of character data: In XHTML, you need to enclose data like scripts in <![CDATA[...]]> or similar (I don't remember the exact syntax) or XML-escape them, whereas in HTML literal handling is assumed. This makes dual-XHTML/HTML scripts a little unwieldy (//<![CDATA[...//]]>). Another difference is that XML supports namespaces whereas HTML merely supports colons in tag names. XHTML is practically dead now (thankfully). – John Dvorak – 2013-06-11T16:54:19.583

Maybe some special characters in the sample input would help to exemplify the escaping part of the requirement. – manatwork – 2013-06-11T17:48:28.030

@manatwork I assume "valid XML that displays the input characters" – John Dvorak – 2013-06-11T18:03:19.390

@JanDvorak, I understand that. But I see tmartin used the sample input for testing his code, just as I did. So I wish we have a better official sample, including as much catchy items as possible. – manatwork – 2013-06-11T18:09:08.413

It's my first question, so I appreciate contributions to the question to make it "golfy"; I think that basic idea is clear. – Dan – 2013-06-11T19:17:51.037

Answers

2

Ruby: 233 characters

BEGIN{puts"<table>";f=[]};if/^[-|]+$/ then if f!=[] then puts"<tr>\n#{f[1..-2].map{|i|"    <td>#{i.join"<br/>"}</td>"}*$/}\n</tr>\n";f=[]end else$F.each_index{|i|f[i]||=[];f[i]<<CGI.escapeHTML($F[i])if$F[i]>""}end;END{puts"</table>"}

Sample run:

bash-4.2$ ruby -naF' *\| *' -r cgi -e 'BEGIN{puts"<table>";f=[]};if/^[-|]+$/ then if f!=[] then puts"<tr>\n#{f[1..-2].map{|i|"    <td>#{i.join"<br/>"}</td>"}*$/}\n</tr>\n";f=[]end else$F.each_index{|i|f[i]||=[];f[i]<<CGI.escapeHTML($F[i])if$F[i]>""}end;END{puts"</table>"}' table.txt
<table>
<tr>
    <td></td>
    <td>1233456</td>
    <td>abc</td>
    <td>xyz</td>
    <td></td>
    <td></td>
</tr>
<tr>
    <td>abc</td>
    <td>abcdefgh<br/>xyzabcqw<br/>zzzz</td>
    <td>1234567</td>
    <td></td>
    <td>12345</td>
    <td></td>
</tr>
</table>

(Out of contest CW, due to the shameless amount of used command line options.)

manatwork

Posted 2013-06-11T01:59:31.287

Reputation: 17 865

It's better to wait until the spec is fairly complete before answering, because otherwise there's a risk that either you'll have to rewrite to accommodate clarifications or that you'll bias the clarifications towards your implementation (which might mean biasing them away from the more interesting problem). – Peter Taylor – 2013-06-11T10:16:57.420

Use of command line options is fine, but they must be included into the code length. – John Dvorak – 2013-06-11T18:42:11.337

@JanDvorak, I know that -p counts 1 in case of perl and ruby. But never found a concise set of rules for them. Do you know about such rule collection? – manatwork – 2013-06-11T18:52:54.977

@manatwork Try ruby -h to get you started. – steenslag – 2013-06-12T21:49:58.383

@steenslag, you misunderstood the question. I know the interpreter's command line options. I not know how their use should be counted in code-golf challenges' results. – manatwork – 2013-06-13T07:35:23.097

2

K, 186

Haven't put any effort into golfing this.

{-1',/(,"<",g;,/{,/(,"<tr>";x;,"</tr>")}'{("<td>",/:*:'x),\:"</td>"}'{{$[1<+/~""~/:x;,"<br/>"/:x;x@&~""~/:x]}'+x}'-1_'/:1_'/:f@&~()~/:f:$`$"|"\:''1_'(&"-"in/:x)_x;,"</",g:"table>");}

.

k)t
"-------------------------------------------------------"
"|       |1233456 | abc    | xyz    |        |         |"
"|-------|--------|--------|--------|--------|---------|"
"|       |abcdefgh| 1234567|        | 12345  |         |"
"|   abc |xyzabcqw|        |        |        |         |"
"|       |zzzz    |        |        |        |         |"
"|-------|--------|--------|--------|--------|---------|"

k){-1',/(,"<",g;,/{,/(,"<tr>";x;,"</tr>")}'{("<td>",/:*:'x),\:"</td>"}'{{$[1<+/~""~/:x;,"<br/>"/:x;x@&~""~/:x]}'+x}'-1_'/:1_'/:f@&~()~/:f:$`$"|"\:''1_'(&"-"in/:x)_x;,"</",g:"table>");} t
<table>
<tr>
<td></td>
<td>1233456</td>
<td>abc</td>
<td>xyz</td>
<td></td>
<td></td>
</tr>
<tr>
<td>abc</td>
<td>abcdefgh<br/>xyzabcqw<br/>zzzz</td>
<td>1234567</td>
<td></td>
<td>12345</td>
<td></td>
</tr>
</table>

tmartin

Posted 2013-06-11T01:59:31.287

Reputation: 3 917

0

GNU Awk: 277 characters

BEGIN{split("38&60<62>34\"39'",e,/\W/,c)
FS=" *\\| *"
print"<table>"}/^[-|]+$/&&n{print"<tr>"
for(i=2;i<n;i++)print"    <td>"f[i]"</td>"
print"</tr>"
delete f
next}{for(i=1;i<6;i++)gsub(c[i],"\\&#"e[i]";")
for(i=2;i<n=NF;i++)$i&&f[i]=f[i](f[i]?"<br>":"")$i}END{print"</table>"}

Note that the above code requires gawk version 4.0 or never, because

  • delete for an entire array is GNU extension
  • split()'s 4th parameters was implemented in version 4.0

Without HTML escaping: 201 characters

As there is not specified how HTML escaping should happen (whether the use of built-in or third party functions is allowed, the set of characters to escape, whether using entity names or character codes matters), I believe this challenge was neater without the escaping:

BEGIN{FS=" *\\| *"
print"<table>"}/^[-|]+$/&&n{print"<tr>"
for(i=2;i<n;i++)print"    <td>"f[i]"</td>"
print"</tr>"
delete f
next}{for(i=2;i<n=NF;i++)$i&&f[i]=f[i](f[i]?"<br>":"")$i}END{print"</table>"}

Sample run:

bash-4.1$ awk -f table.awk table.txt
<table>
<tr>
    <td></td>
    <td>1233456</td>
    <td>abc</td>
    <td>xyz</td>
    <td></td>
    <td></td>
</tr>
<tr>
    <td>abc</td>
    <td>abcdefgh<br>xyzabcqw<br>zzzz</td>
    <td>1234567</td>
    <td></td>
    <td>12345</td>
    <td></td>
</tr>
</table>

manatwork

Posted 2013-06-11T01:59:31.287

Reputation: 17 865

-4

Python: 489

t = [] #table
l = raw_input() #line
r = [] #row
while l:
    if '-' in l:
        if r:
            t.append(r)
            r = []
    else:
        if not r:
            r = [c.strip() for c in l.split('|')[1:-1]]
        else:
            l = l.split('|')[1:-1]
            r = ["<br />".join([r[i],l[i].strip()]) if l[i].strip() and r[i] else r[i] if r[i] else l[i].strip() for i in range(len(l))]
    l = raw_input()

h = "<table>" #corresponding html code
for r in t:
    h += "\n<tr>\n<td>" + "</td>\n<td>".join(r) + "</td>\n</tr>"
h += "\n</table>"

print h

(Also on github)

Probably not the shortest possible solution in Python and I welcome feedback!

Output:

>python character_table_to_xhtml_v2.py
-------------------------------------------------------
|       |1233456 | abc    | xyz    |        |         |
|-------|--------|--------|--------|--------|---------|
|       |abcdefgh| 1234567|        | 12345  |         |
|   abc |xyzabcqw|        |        |        |         |
|       |zzzz    |        |        |        |         |
|-------|--------|--------|--------|--------|---------

<table>
<tr>
<td></td>
<td>1233456</td>
<td>abc</td>
<td>xyz</td>
<td></td>
<td></td>
</tr>
<tr>
<td>abc</td>
<td>abcdefgh<br />xyzabcqw<br />zzzz</td>
<td>1234567</td>
<td></td>
<td>12345</td>
<td></td>
</tr>
</table>

meenakshi

Posted 2013-06-11T01:59:31.287

Reputation: 1

2-1: A) Please don't make me click a link to view your answer. B) the link contains way too much information; You're quoting the whole question (!) and I need to scroll down to find your code. C) The only effort that I can see went into golfing your code was to make short variable names. (You have empty lines and comments in there!) D) When there are no special requirements to run your code you really don't need to provide a sample run or usage; Just pasting the code is (a lot) more useful. – daniero – 2013-06-26T22:27:20.223

1And that number you put up in the header of your post is nowhere near the character count – daniero – 2013-06-26T23:41:21.547

And please specify when your code skips some of the rules. For example your code does not escape HTML special characters. – manatwork – 2013-06-27T06:40:34.717