How to convert an CHM file into a single HTML file?

2

1

I have tried many different CHM-to-HTML utilities, but I am having a difficult time finding one that is able to produce a single HTML file. I can decompile a CHM file using hh.exe, but I don't know how to easily merge the resulting files into a single HTML file, all while preserving the correct order of pages.

Is there a free tool which can do this? If not, how can I merge the HTML files in order?

expert

Posted 2012-03-04T23:36:59.230

Reputation: 298

Have you tried any other utilities? Without knowing, answers to your question may just as well be taken directly from a quick search.

– iglvzx – 2012-03-05T00:03:43.893

I tried many utilities. Which one would you recommend ? – expert – 2012-03-05T00:05:05.733

Just tried tool by Gridinsoft from your link. Doesn't work. – expert – 2012-03-05T00:06:01.753

In fact I already wasted 3 hours of installing and trying all possible crap. None of them works properly. That's why I came here :) – expert – 2012-03-05T00:07:03.110

Ok. So, we can conclude that producing a single HTML file is not a standard feature of such software (or at least free software). – iglvzx – 2012-03-05T00:10:45.740

1No, unfortunately. I edited your post to reflect our discussion, so this question should not get closed as being just a product recommendation. – iglvzx – 2012-03-05T00:17:04.453

Answers

0

An html archive consists out of a set of html pages with associated media (read: images and simple javascript).

A CHM has a indication which page is the "main" page, which is usually some overview page. Besides that, it has a Table of Contents (TOC) which is a tree of nodes that point to html files. Walking the tree would give a more or less linear order.

But the default page might not be the first page of the TOC, or in the TOC at all, and not all pages might be in the TOC. In that case there is no order that can be detected by automated means.

Extracting a CHM with a general decompilation tool will yield you a bunch of htmls, a .hhk and a .hhc. The .hhc is the TOC in XML form. The hhk is the index, but you don't need it now. The default page is in an internal file, and generally not visible after extraction (use properties of chm tools)

Besides the determining of the order, there is the actual merging itself. This can be hard, but practical workarounds might be importing them into office by some scripted means.

I think an able scripter might pull it off, but it is not trivial.

Marco van de Voort

Posted 2012-03-04T23:36:59.230

Reputation: 211