I'm writing some quantitative marketing research software, and have a bit of a "blind leading blind" situation at work.
I'm not really used to writing web apps or configuring servers; most of the code I write is intended to be executed by me only, on my machine. The boss says that the time has come for nice web interfaces an enterprise-wide scale pilot of one of my research projects, and it's up to me to specify the architecture. The company's server guys are only really used to vanilla LAMP stacks and are eager to help, but are relying on me for direction. (?!)
Here's what I'm trying to get done:
- Put some products and customer emails in a purpose-built databse.
- Choose some subset of customer emails and some products from the database.
- Choose some products and create related survey webpages.
- Send emails to the customers asking them to do a survey.
- Record their survey answers.
- Analyse the answers.
- Present the answers in a web interface.
The analysis and the interface is very industry-specific, and there are no third-party providers of such market research software, hence the DIY.
Here's how I proposed to do it:
- Web interface for product and email insertion hosted on company intranet. Database to only accept internal connections. Probably python script on the server to actually do the data insertion, because a lot of error checking and other product-related processing has to happen first and I don't want to do that in php.
- Access email addresses in database and create emails requesting survey participation. Again, write this in python. Interface with Amazon SES for the bulk mail delivery. I'm the only one with execute access to this script. Do this on my PC.
- Create surveys (on my PC). Dump survey content manually to a rented Azure stack that I'll call VM1 via sFTP. Should I be using nginx for this stack? The server guy only has config experience with Apache. The VM has to cope with 10k connections per hour for the days the emails go out (of course, I'll start slow and monitor it).
- Serve survey html pages, using php to serve up the right content to the right people depending on the hashed survey id variable they pass. Write survey responses to database stored in another Azure VM (called "VM2") using php. Access via a VPN tunnel between the two VMs.
- After a few days, read survey responses via another VM running R. (VPN tunnel from VM2 to VM3). After loading data in R, close VPN tunnel to isolate VM3. Still on VM3, process using R, and serve this up via nginx, php, and a javascript charting module addon through an SSL connection to other interested people at the company.
Am I proposing anything really stupid in that list? Is segregating the database in a separate VM adding any security or am I just fooling myself?