We have a data warehouse that stores prices for our products. Once every business day, new data points are added representing the day's price for each product. On the website, a user sees this information on a page load. There are currently 7 products; analytics shows that the amount of page loads is about 100 per day.
The website server is an external server to our data center. The data warehouse server is internal to our data center and has other critical data (in different tables) not related and should not be accessible to the website.
The following approaches have been considered:
The data warehouse could daily push (SFTP) a CSV file containing the daily data to the web server. The web server would have a process running on a crontab every 15 minutes. It would check if the file had changed. If so, it would update its database with the data. Then, on page loads, the web server would query its database to get the data to display on the web page.
Usually, the push would only be once a day, but more than one push could be possible to communicate (infrequent) price corrections. Even in the price correction scenario, all data would be delivered in the file. The polling process would pick up the change and overwrite the data in the database.
The web server could request data from the data warehouse using JDBC or similar SQL connection technology.
However, a security concerns have been voiced. The concern is that by allowing the web server access to our data warehouse, SQL injection attack or some other external attack through the website could compromise the data warehouse. Measures could be put in place to reduce the risk, but the easiest and safest approach has been suggested to simply not allow any public facing system to directly access the data warehouse. In other words, the data warehouse can establish a communication with other servers (say to SFTP the file), but no server can initiate a connection with the data warehouse. Do these concerns seem reasonable and hard to mitigate?
A web service could be built, and the web server could call the web service that is hosted in our internal data center.
A process hosted in our internal data center could call the web server when it knows the data warehouse has data available. If so, how should this be done? HTTPS with some way to prevent other unauthorized clients from making the same call?
Which of the above approaches is best or is there a better approach than listed above? What are the pros and cons to an approach?