Simplest setup on Windows to send HTTP request, get and apply regex to response, and write match(es) to output file

3

2

I'd like to write a script that records the size of the close votes review queue on SO (currently ~95.5k), polling just a few times an hour, so I could plot a general trend. I know what I'm going to do regarding the parsing, i.e. given the following part of the HTTP response,

<div class="dashboard-num" title="95,508">95.5k</div>

I'd apply the regex

<div\s+class="dashboard-num"\s+title="([^"]+)

and split by \D and implode the array to leave only numbers, or something similar. (Yes, The Pony, He Comes, but this is a quick-and-dirty job during which I don't expect Stack Overflow's HTML to change.)

I don't currently have a UNIX / Linux setup, else I'd throw something together using cron, cURL, and Perl (or sed or awk if I'm feeling brave enough). What's the easiest way to do this on Windows? Is there some utility that's built to do this? I'm willing to install Cygwin and such software if it's indeed the easiest way (say, compared to writing batch scripts), but I'm hoping for some program into which I can supply a URL and regex and be on my way.

Andrew Cheong

Posted 2013-11-25T19:24:38.350

Reputation: 1 355

note that you can set GETs and recieve the response via Telnet. you can probably do everything you need to in powershell or even a .bat, but personally I'd do it in somthing like .net, python, or java. python install on windows can be a pain, or I'd just suggest it right off. – Frank Thomas – 2013-11-25T19:44:03.423

windows does have vbscript and jscript and powershell. i've done regexes in vbscript. batch files are very primitive, no regexes there. But if you don't know vbscript or jscript or powershell, or even if you do, then it's still totally fine and good to install cygwin or gnuwin32, you can use *nix utilities.. wget, grep, sed e.t.c. I'm a big windows user but still make much use of those utilities from gnuwin32. And if I write a batch file I often use them too. Looking at your title i'd immediately think wget and grep or sed. or wget with perl one liner to do a sed task. – barlop – 2013-11-25T20:03:49.783

I don't know about cron but there is windows task scheduler. – barlop – 2013-11-25T20:08:01.477

@FrankThomas what about installing Python on Windows is a pain? Download the .msi from python.org, run, you're set. If you want extras, go to http://www.lfd.uci.edu/~gohlke/pythonlibs/. What's so hard about that?

– MattDMo – 2013-11-25T23:16:02.213

Answers

3

Actually, while waiting for someone to suggest a magical program to solve my every need, I decided to give Cygwin a shot, and found it was easier to do than I thought.

I simply

  1. downloaded Cygwin,
  2. made sure to check curl, cron, and cygrunsrv during installation,
  3. followed the steps described in this question (well, actually, I ran into some problems, but some Google searches suggested installing via cron-config with defaults, entering ntsec for the daemon, and inputting my Windows password, which worked),
  4. set up the following crontab:
    * * * * * /home/andrew/cron/get_cvrq_size.sh
  5. set up the following get_cvrq_size.sh:
    curl https://stackoverflow.com/review \
        | grep dashboard-num \
        | head -1 \
        | sed 's/^.*<div class="dashboard-num" title="\([^"]\+\)".*$/\1/' \
        | sed 's/,//g' \
        | sed 's/^/'`date -Iseconds -u`',/' \
        >> /home/andrew/cron/cvrq_size.txt

and it's been working like a charm :-)

2013-11-25T20:05:01+0000,95583
2013-11-25T20:06:01+0000,95583
2013-11-25T20:07:01+0000,95583
2013-11-25T20:08:01+0000,95583
2013-11-25T20:09:02+0000,95589
2013-11-25T20:10:01+0000,95589
2013-11-25T20:11:01+0000,95587
2013-11-25T20:12:01+0000,95587
2013-11-25T20:13:01+0000,95586
2013-11-25T20:14:01+0000,95589
2013-11-25T20:15:01+0000,95587
2013-11-25T20:16:01+0000,95586
2013-11-25T20:17:01+0000,95585
2013-11-25T20:18:01+0000,95584
2013-11-25T20:19:01+0000,95596
2013-11-25T20:20:01+0000,95596
2013-11-25T20:21:01+0000,95596
2013-11-25T20:22:01+0000,95595
2013-11-25T20:23:01+0000,95595

Andrew Cheong

Posted 2013-11-25T19:24:38.350

Reputation: 1 355

while running this every minute may be OK, don't be surprised if you get cut off at some point. Figure out your use case and send the minimum number of requests necessary. Server admins don't necessarily like scripts pinging their machines like this... – MattDMo – 2013-11-25T23:20:33.627

1even just the layout of how you did your piping (with the backslashes and new lines), is worthy of a +1 – barlop – 2013-11-26T11:45:30.367

Please do not use date -Iseconds -u (ISO-8601) as I did. I thought it was a decent standard but it turns out it's just a huge pain to convert—Perl, Python, and even Mathematica require external(ly compiled) libraries or custom methods to parse this format correctly (including timezones). Use date +%s instead, for seconds since epoch. Inspired by @Emracool. – Andrew Cheong – 2013-12-12T19:42:09.467