How to convert HTML tags to RTF or any rich format text from the Linux command line


How can I convert HTML tags to rtRTF or any rich format text using sed or any linux command-line tool?

I've achieved to strip them with sed 's/<[^>]*>//g', but I need the <b>hi</b> to convert to **hi**.


Posted 2012-04-23T10:04:21.613

Reputation: 121



html2text is a command-line tool that converts HTML to Markdown.

You will most probably get very frustrated trying to use sed to do this without error. The reason is touched upon in a legendary SO post. In very basic cases it might work, but it will come back to haunt you of you make it a habit, so learn how to do it correctly directly instead. Using a ready tool such as html2text is a lot better than trying to regex it out by hand.

Daniel Andersson

Posted 2012-04-23T10:04:21.613

Reputation: 20 465