It really depends on how easy/complex the information that is represented in the web page is. If it's something that can be grepped out, then you could use the SO answer here (from the comment above). However, if it's not something that can be easily grepped out, then you could write a Python script that can easily do this for you. You would need to use urllib2 and cookiejar, and then use something like lxml and BeautifulSoup to parse out the HTML. The SO answer here is an excellent guide on how you could potentially login. For ease, I'm going to copy paste the code here:
import cookielib
import urllib
import urllib2
from BeautifulSoup import BeautifulSoup #you can also use lxml, if you wanted.
# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# Add our headers
opener.addheaders = [('User-agent', 'RedditTesting')]
# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)
# The action/ target from the form
authentication_url = 'https://ssl.reddit.com/post/login'
# Input parameters we are going to send
payload = {
'op': 'login-main',
'user': '<username>',
'passwd': '<password>'
}
# Use urllib to encode the payload
data = urllib.urlencode(payload)
# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)
# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()
# parse the page using BeautifulSoup. You'll have to look at the DOM
# structure to do this correctly, but there are resources all over the
# place that makes this really easy.
soup = BeatifulSoup(contents)
myTag = soup.find("<sometag>")
You can then run this every X number of minutes, or you could use Python itself to time the execution of the above function every X minutes, and post/email the results. Depending on what you're trying to do, it might be overkill, but when I've needed to do something similar in the past, this is the route I've taken.
Does it use cookies or do you need to login every time? – Thomas Weller – 2013-12-25T23:03:21.167
@ThomasW. If I click a
Remember Me
button when logging in, yes, it does since it automatically has me logged in. – hichris123 – 2013-12-25T23:08:35.183There's a good answer for this question here: http://stackoverflow.com/questions/1324421/how-to-get-past-the-login-page-with-wget
– sahmeepee – 2013-12-25T23:38:15.620