How do I prove that robots.txt was not provded

Question

I want to scrape our university's learning platform website, to let myself know via notifications when a new entry added to any lesson.

But, I'm scared that they'll put robots.txt afterwards and sue me or something, I don't know. I just don't have any experience of this. I just know that I should look at robots.txt before scraping any website.

And I think they've just forgotten the put it for know.

Anyways, how do I ensure beforehand and take proof of it that it didn't exists when I was scraping. Anything that my proof is valid.

Robots.txt tells the crawlers what part of site not to crawl. — yeah_well, Mar 01 '21 at 10:07
Also just because they put something in robots.txt doesn't mean the law will side with them. — yeah_well, Mar 01 '21 at 10:07
most LMSs actually provide an API, often one that student-level accounts can utilize; you might not need to "hack" anything. Also, look in the ajax calls once the site is up; they probably ship the data you need in a nice clean json format. — dandavis, Mar 01 '21 at 17:59

score 2 · Accepted Answer · answered Mar 01 '21 at 10:09

2

`robots.txt` means nothing

The Simpsons explain it pretty well:

robots.txt is not an "access restriction", but instead merely a polite request to a complying web crawler not to index something. A web crawler can simply disregard this file and index whatever it wants anyways.

If you want to be sure, simply send them an e-mail and ask for permission. Or you know, just do it. A web-crawler that runs once an hour and does a few hundred requests with one request per 500ms won't disturb any server.

answered Mar 01 '21 at 10:09

re: "_one request per 500ms won't disturb any server._"; depends on the server and lms platform. For example, Moodle running a class with 500 students and many grade entries can definitely take longer than 500ms to respond to some queries. – dandavis Mar 01 '21 at 18:01
@dandavis I'm assuming a usual web server with usual response times. – Mar 01 '21 at 19:59

How do I prove that robots.txt was not provded

1 Answers1

robots.txt means nothing

`robots.txt` means nothing