How to convert HTML to text?

12

0

How it is possible to convert HTML to text file in Linux? For example I want to curl a query to Google, then convert the output html to text and read converted text on my terminal. I am using RHEL6.

rivu

Posted 2013-11-09T23:05:59.990

Reputation: 261

Answers

10

I don't think curl has a built in HTML processor. However:

lynx --dump <URL>

does the trick.

If you still want to use curl, you could use html2text (available in Ubuntu).

Teun Vink

Posted 2013-11-09T23:05:59.990

Reputation: 2 107

FYI lynx expects/documents a single - as option prefix. Although it'll process a -- just fine. – ocodo – 2018-07-10T01:17:41.300

6

You can install html2text (an advanced HTML-to-text converter) and the usage is straight forward:

$ html2text http://example.com/
$ cat file.html | html2text -o file.txt

Install by:

  • Linux: apt-get install html2text
  • OS X: brew install html2text

Example with curl:

$ curl -sL google.com | html2text
Search Images Maps Play YouTube News Gmail Drive More ?
Web History | Settings | Sign in
     A better way to browse the web
       Get Google Chrome

          Advanced search Language tools

        [Google Search][I'm Feeling Lucky]

     Advertising Programmes Business Solutions+GoogleAbout GoogleGoogle.com
                           ? 2016 - Privacy - Terms

kenorb

Posted 2013-11-09T23:05:59.990

Reputation: 16 795