How to extract multiple urls from String?

1

I have a string like below

https://website.com/BTAE/2015/BTAE assignment jan 15.pdf²0ÔË'\„QI„"ÙP¾^ŒŸZ‡@Æ*]Ü.^‚vðƒ€Ù¾»Æš©Šñ‘€é• ªÂIR#œÉgÉÛ^gMdÉ%9¬e˜Hžôb¿'0<ô ?lþzk…éÃÄórÈ;EW¦K³1…²ì¶ZFžŠÒô*ÄÖ\ã]»’{ÂMçí¦DêiÁßÅÁ½ :n„q¹1ÙDRó=±Â{EDûEb@N5tÍ›,§ààká@¡;(º\0AÇSª¾Q¾ÒÉœí[‘rú€"?í®§ä‡ÕYÈ<¸^WÐPÁ’4îÖƒÔ'…÷f·qhttps://webservices.ignou.ac.in/assignments/BTAE/2015/BTAE assignment jan 15.pdf https://website.com/BTAE/2015/base-005.pdf

I need to get urls of all pdf, doc files from the string.

I am new to SHELL SCRIPT and searched a lot but didn't got any success.

Aabir Hussain

Posted 2018-09-03T13:23:24.700

Reputation: 113

Which OS and shell? – harrymc – 2018-09-03T19:44:27.623

I am using ubuntu 14.04.5 – Aabir Hussain – 2018-09-04T04:29:25.947

Answers

2

You can do something like this:

grep --only-matching -P "http.*?\.(pdf|doc)" myfile.pdf

The output for your sample is:

https://website.com/BTAE/2015/BTAE assignment jan 15.pdf
https://webservices.ignou.ac.in/assignments/BTAE/2015/BTAE assignment jan 15.pdf
https://website.com/BTAE/2015/base-005.pdf

fejyesynb

Posted 2018-09-03T13:23:24.700

Reputation: 443

there is one more problem with script => https://website.com/bschindi11.htm https://website.com/bschindi11.htm#aoc https://website.com/bschindi11.htm#ec https://website.com/bschindi11.htm#fc https://website.com /CHEMISTRY/2007/CHE_01.doc . It is giving me this url.

– Aabir Hussain – 2018-09-04T08:31:09.797

@AabirHussain That is because the URL is not prefixed with http – fejyesynb – 2018-09-05T03:41:17.780

Sorry for wrong comment but there is one more problem with script => https://website.com/bschindi11.htm https://website.com/bschindi11.htm#aoc https://website.com/bschindi11.htm#ec https://website.com/bschindi11.htm#fc https://website.com /CHEMISTRY/2007/CHE_01.doc . It is giving me this url

– Aabir Hussain – 2018-09-05T06:02:24.027

@AabirHussain Enclose your whole block in backticks (`) – fejyesynb – 2018-09-06T02:00:51.850