If you are running on a Linux or a Unix system (like FreeBSD or macOS), you can open a terminal session and run this command:
wget -O - http://example.com/webpage.htm | \
sed 's/href=/\nhref=/g' | \
grep href=\"http://specify.com | \
sed 's/.*href="//g;s/".*//g' > out.txt
In usual cases there may be multiple <a href>
tags in one line, so you have to cut them first (the first sed
adds newlines before every keyword href
to make sure there's no more than one of it in a single line).
To extract links from multiple similar pages, for example all questions on the first 10 pages on this site, use a for
loop.
for i in $(seq 1 10); do
wget -O - http://superuser.com/questions?page=$i | \
sed 's/href=/\nhref=/g' | \
grep -E 'href="http://superuser.com/questions/[0-9]+' | \
sed 's/.*href="//g;s/".*//g' >> out.txt
done
Remember to replace http://example.com/webpage.htm
with your actual page URL and http://specify.com
with the preceding string you want to specify.
You can specify not only a preceding string for the URL to export, but also a Regular Expression pattern if you use egrep
or grep -E
in the command given above.
If you're running a Windows, consider taking advantage of Cygwin. Don't forget to select packages Wget
, grep
, and sed
.
@JeffZeitlin: I have tried
Invoke-WebRequest
in Powershell 5. I use both Windows and Linux, native terminal/Powershell method is preferred. – user598527 – 2017-02-01T16:57:45.1771
Please note that https://superuser.com is not a free script/code writing service. If you tell us what you have tried so far (include the scripts/code you are already using) and where you are stuck then we can try to help with specific problems. You should also read How do I ask a good question?.
– DavidPostill – 2017-02-01T16:58:23.0201If Invoke-WebRequest is not returning the HTML for the page your are interested in, you will need to troubleshoot that first. Once your Invoke-WebRequest succeeds, you should be able to parse the resulting HTML to extract what you want. Do not expect us to write the script for you, as DavidPostill indicates; you will need to 'show your work'. – Jeff Zeitlin – 2017-02-01T16:59:56.610