Pulling dollar amounts from DoD Contract Awards

1

I'm going through Department of Defense press releases (here) looking for dollar amounts greater than $500 million. Each document is between 1,000 and 15,000 words, with dollar amounts inline with normal discussion text. There are about 2,500 documents I would like to review. My hit rate so far (manually) is about 1 in 8, so for every 8 documents I search I find one with a dollar amount that I'm looking for. Once I find a contract worth, say $546 million, I record the company, the date, and a brief description.

So my question is, how can I automate finding documents with numbers >500,000,000?

Here is an example of one paragraph

General Dynamics Electric Boat Corp., Groton, Connecticut, is being awarded a $234,229,426 cost-plus-fixed-fee contract for design agent, planning yard, engineering and technical support for active nuclear submarines. The efforts [...] This contract includes options, which, if exercised, would bring the cumulative value of this contract to $1,537,500,654. Work will be performed in Groton, Connecticut (73 percent); Bangor, Washington (9 percent); Norfolk, Virginia (6 percent); Newport, Rhode Island (4 percent); Quonset, Rhode Island (3 percent); Kings Bay, Georgia (3 percent); and Pearl Harbor, Hawaii (2 percent), and is expected to be completed by Sept. 30, 2015. Fiscal 2014 other procurement (Navy); fiscal 2011, 2012, 2013 and 2014 shipbuilding conversion (Navy); fiscal 2014 research, development, test and evaluation; and fiscal 2014 operations and maintenance (Navy) funding in the amount of $20,333,452 will be obligated at the time of award, and contract funds in the amount of $1,520,650 will expire at the end of the fiscal year. This contract was not competitively procured in accordance with FAR 6.302-1(a)(2)(iii) - only one responsible source and no other supplies or services will satisfy agency requirements. The Naval Sea Systems Command, Washington, District of Columbia, is the contracting activity (N00024-14-C-2104).

I think it should be possible to turn that into something like

$234,229,426

$1,537,500,654

$20,333,452

$1,520,650

Which I could easily glance at to see if I should go back and read the document. It would be even better if I could just see the 1,537,500,654 number.

So I really have no coding skills of any type and was hoping that wouldn't be necessary. I don't need it to be perfect. I can copy and paste things, but I would like to make this easier somehow. I really have no idea where to start and what applications might be relevant to my plight. I've got access to Macs and PCs. Any advice you can give is appreciated.

jmabs

Posted 2015-05-25T02:40:42.890

Reputation: 287

Can you rely on the numbering being fully written out in a single format ($1,537,500,654), or do you need to be able to find other variants as well ($500 MM, 74 million dollars, an award in the amount of sixty million)? – Jason Aller – 2015-05-25T02:55:57.513

If you export all the contracts to text files (using the Print view on the site) or even to Word docs, regular expressions should be able to find the numbers for you. Direct scraping of the site would also be possible I suppose. – Karan – 2015-05-25T03:04:16.230

They are extremely consistent in their $xxx,xxx,xxx format, Jason. So I only need to find that specific format. (Specifically, I would be looking for $xxx,xxx,xxx and $x,xxx,xxx,xxx and $xx,xxx,xxx,xxx). There is always a space before and after. Karan, thanks for the reply, however I wouldn't know where to start with "regular expressions." – jmabs – 2015-05-25T03:08:53.193

This us doable, but probably not a SU question. You might want to provide more information as to the resources you have - If its a Linux based system its possible to write a script to (a) use WGET or CURL to spider all the documents, then use something (PHP, Bash, Python) to look for the appropriate token(s) in each file and then print a summary list. Of-course this way assumes you are using Linux. – davidgo – 2015-05-25T03:36:08.787

I can ssh into a Red Hat enterprise distro from my Mac, if that counts. But, as written, I couldn't write a hello world script. So is writing something for this straightforward enough to ask someone to help me? Where should I ask? Thanks! (I was even wondering if something could be written for Excel, but that might be naive) – jmabs – 2015-05-25T04:19:06.053

Answers

2

Create a new bookmark and paste the following code into the location field:

javascript:%20(function(){var%20s%20=%20'';%20var%20d;%20$.ajax({url%20:%20document.URL,%20success%20:%20function(result){var%20d%20=%20/[$](\d[,]?)+/g%20;%20while%20(m=d.exec(result))%20{s%20+=%20m[0]+'\n';}%20alert(s);}});})();

Save it under a name like "Show dollar values". This is a bookmarklet. Click it on a webpage and it will pop up an alert with a list of all dollar values that occur on the page.

The code above relies on JQuery, so if the web page you're using it on doesn't already load JQuery, you'll need to use this Append JQuery bookmarklet first.

customizing the match.

It's fairly easy to modify the values the bookmarklet displays. For instance, the code below is modified to only display 9-figure or higher amounts:

javascript:%20(function(){var%20s%20=%20'';%20var%20d;%20$.ajax({url%20:%20document.URL,%20success%20:%20function(result){var%20d%20=%20/[$](\d[,]?){8}(\d[,]?)+/g%20;%20while%20(m=d.exec(result))%20{s%20+=%20m[0]+'\n';}%20alert(s);}});})();

If you find the {8} in that code and replace it by another number, say N, you'll change the cut-off number of digits N+1.

If you wanted to customize this bookmarklet for more general use, you'll need to look up "Javascript regular expressions" to learn the syntax you'll need. This part (/[$](\d[,]?)+/g) of the code is what controls the matching. If you change the part between / and /g, you'll change what the bookmarklet matches.

pyrocrasty

Posted 2015-05-25T02:40:42.890

Reputation: 1 332

That's great, thanks pyrocrasty! Not to ask too much, but is it possible to have it only return values greater than $500,000,000 (or maybe just $100,000,000+ by filtering out strings less than ...$234,678,012 ... 12 characters)? Thanks (also, if anyone else ever finds this, note that this worked in Firefox but not Chrome or Safari on a Mac). – jmabs – 2015-05-25T05:02:35.583

I added a version that finds 9-digits or more (forgot to do that before I posted). The bookmarklets work for me under FF and Chrome on Linux. Not sure what's happening on OS X. Are you sure you didn't lose a character when you pasted? – pyrocrasty – 2015-05-25T05:28:47.853