How to combine wget and grep

11

I have a html-page url and I want to grep it. How can I do it by wget someArgs | grep keyword?

My first idea was wget -q -O - url | grep keyword, but wget's output bypass grep and arise on the terminal in its original form.

Jofsey

Posted 2012-06-01T19:34:49.033

Reputation: 917

grep selects lines delimited by (e.g.) carriage return and linefeed characters, an HTML response doesn't have lines it has text with markup like <br> or <p> so the whole web-page could look like one line to grep – RedGrittyBrick – 2012-06-01T19:44:45.927

1@RedGrittyBrick The OP's command works flawlessly for me. – slhck – 2012-06-01T19:47:30.527

Answers

11

The easiest way is to use curl with the option -s for silent:

curl -s http://somepage.com | grep whatever

Marco

Posted 2012-06-01T19:34:49.033

Reputation: 4 015

1Question asks for wget. Not curl. This won't work with multiple redirects and -L option. – Ligemer – 2016-10-07T23:07:24.800

@slhck: Both commands do exactly the same for me. – Dennis – 2012-06-01T21:39:36.603

@Dennis Try curling http://superuser.com/questions/431581. For whatever reason I tested it with this particular URL and got no output. Dunno what I'm missing. – slhck – 2012-06-01T21:45:44.173

@slhck: Curl doesn't follow redirects by default. It does with the -L switch. – Dennis – 2013-03-31T20:15:45.353

@Dennis Didn't know what you were talking about without seeing the deleted comments – but yeah, that makes sense. Thanks for clearing it up. – slhck – 2013-03-31T20:27:25.127

11

Keeping this around for the sake of completeness.

Your example should actually work. The syntax is correct, and here's a screencast I just took demonstrating it, with a good old GNU wget 1.13.4.

wget -q some-url -O - | grep something

So assume your pattern is wrong and grep will just output everything it got.

slhck

Posted 2012-06-01T19:34:49.033

Reputation: 182 472

It could also be a typo in the URL. With -q, there is no error message. – Dennis – 2012-06-02T00:44:55.670

3

If you are looking to grep or pipe headers, they are standard directed to stderr so you need to redirect them. Eg:

wget -O - http://example.com/page.php > /dev/null 2>&1 | grep HTTP

ErichBSchulz

Posted 2012-06-01T19:34:49.033

Reputation: 281

2This is the correct way of doing it, thanks! – Udayraj Deshmukh – 2018-03-23T06:38:58.183

See also the answers here

– Suzana – 2018-09-25T09:26:44.877

3

This bug was in v1.12.1 fixed in another version. Currently I use v1.15 and it works as expected.

Leben Gleben

Posted 2012-06-01T19:34:49.033

Reputation: 55

0

The wget writes its output to stderr not to stdout, so one needs to redirect the stderr to stdout:

wget -q -O - url 2&>1 | grep keyword

vstepaniuk

Posted 2012-06-01T19:34:49.033

Reputation: 15