how to automatically and periodically scrape website

1

I am leaving reliable phone/internet service for several months. I want certain website accounts to be logged in and checked daily, and the text from those stored or emailed to me for later review. I cannot rely on my own phone or server or anything to do it, so I hope there is an online place that this job can be hosted.

In case that understanding the reasons for my question might help provide the answer:

  1. One of these sites I am required by court order to respond to messages periodically, and I dont like the site reporting exactly when I did and didnt check, what time/date/IP and so on.

  2. One of these sites records only the last 30 days of transactions before it disappears, and I want a permanent record, in case 30 days goes by in between login.

J. Win.

Posted 2017-11-28T02:26:29.660

Reputation: 143

Question was closed 2017-11-29T01:09:08.787

are you a programmer or not? – plonknimbuzz – 2017-11-28T06:15:57.190

@plonknimbuzz Yes but I learned in the 90s. Seems like this should be simple stuff, ie make the site think that I am a user clicking and typing, and capture its output, but I dont know the right tools to use, especially for scheduling a task to run hands off when I have no computer to schedule cron or whatever. – J. Win. – 2017-11-28T15:47:50.840

why i asking this? because the answer must be "tools" or "script". in scripting method: i will answer use phantomjs to do this with cron/scheduler as a trigger. But seems you are looking for tools, i didnt have an answer for this. good luck – plonknimbuzz – 2017-11-28T15:56:30.937

Answers

1

You would probably be best off making that application yourself. Here is a great starting point: screen-scraping-in-c-using-webclient

netfed

Posted 2017-11-28T02:26:29.660

Reputation: 146

1

I'm not sure if you're a programmer or not, but even if you aren't, you can ask someone to do what I suggest.

Linux has something called a cron job. These are pre scheduled tasks that can be set to do something.

Use these cron jobs to run a PHP script that does the following.

Code the PHP script to access the webpage you need to access. Next, ask the PHP script to get the HTML code from the webpage using some function like file_get_contents().

Now code it to sort through the data there and store the data relevant to you. You can do this by starting to store data after a particular keyword, like title of the data you need, and stop storing data when it encounters another keyword, like the title of the next topic.

I hope this helps

Pradyoth Shandilya

Posted 2017-11-28T02:26:29.660

Reputation: 75

I can write a PHP script, but I don't have access to any system to schedule/run the job and store results. Is there a paid service that can do this? – J. Win. – 2017-11-28T15:52:50.317

You could try using some free hosting services like 000webhost.com. I'm not sure if they support cron jobs, but even if they don't, I'm sure a simple Google search can find you one – Pradyoth Shandilya – 2017-11-28T15:59:03.007

0

You can use Offline Explorer. It has pro version but free version can do a pretty job.

Thunder

Posted 2017-11-28T02:26:29.660

Reputation: 9

That doesnt seem to work remotely and automatically. It requires me to turn on a computer where it is installed, and manually generate the request, every time that I want the site. I wont be able to do that. – J. Win. – 2017-11-28T15:53:25.167