0

I have streaming data on a Windows platform that I capture into Mongodb at fairly high rate of about 800 data points per second. I wish to have access to this data from outside the company, but the company does not wish to port forward 27017 (mongod) to the outside world. I have setup authorization and compiled mongod for ssl.

How can I expose Mongo to an external server? My server is sitting in another location in the "free" internet, and I wish it, every 10 seconds, to get the latest say, 1000 data points out of the server. How would I do this if I cannot port forward?

Can I get the mongo server to "push" data somehow to the external server (which has a fixed IP)? FTP is not a solution as the data streams too rapidly for this (I think).

Can I somehow stream it out using http, or some other protocol?

Ideally I would have liked "native" access to the mongo server as this would have allowed me to use tailable cursors, so any solution which would approximate this functionality would be good. However if this is not possible or practical, a streaming push solution from the firewall protected server, to the outside world server, would also work for me.

Thomas Browne
  • 83
  • 2
  • 14
  • As long as Mongo is set up with SSL, exposing it's port seems like the simplest and best-performing solution. They won't allow the port to be exposed even with SSL turned on? – Jon Onstott Jul 10 '14 at 15:19
  • Nope - we use a high-availability-guarantee service provider who puts a second firewall around each customer that they control; they use this as a business opportunity by making DMZ access payable. That's just unfortunately the exogenous situation I am having to deal with, and I don't want to pay. – Thomas Browne Jul 11 '14 at 04:49

3 Answers3

4

If your private server is always-on, and your company is ok with you using a VPN (that's a big if, check with IT), I would use openVPN and possibly MongoDB replica set.

OpenVPN's security/effort ratio is quite good - it's available as a standard package on most Linux distros, runs on configuration files, has many tutorials, uses static key (simple setup) or TLS (one key per client/server).
OpenVPN HOWTO
Your "external" server will be the VPN server, and the MongoDB "master" will connect to it automatically on startup.

After your servers are connected, you need to choose if you want to query the "master" via VPN, or use a secondary MongoDB server on the "external" to sync, then query it.
MongoDB's replica set allows one server to keep "in sync" with a primary server. It's usually used for fault tolerance, but you can also use it for your purpose.
MongoDB Geographically Redundant Replica Set.
Make sure you the "external" server will be non-voting and with priority 0 (means it won't be part of cluster calculations)

It's best if you confide with your IT guys about the whole solution, and test it before starting to rely on it for production-related tasks

Nitz
  • 1,018
  • 1
  • 8
  • 18
  • 3
    OpenVPN is a much better solution than ssh for an always on tunnel. Besides reliability, it's also more efficient under higher loads because you can run the connection over UDP. Running TCP over a TCP tunnel behaves badly when your connection gets congested, with the retransmissions at each level fighting with each other and adding to the congestion. You could also consider IPSEC, which is technically very sound, but a steeper learning curve. – mc0e Jul 17 '14 at 10:09
  • This is a great article which covers all points regarding installation and configuration of OpenVPN server: http://geek-kb.com/linux/install-and-configure-openvpn-centos-6-x/ – Itai Ganot Jul 17 '14 at 10:46
  • Will the OpenVPN solution over UDP or IPSEC require any ports to be forwarded? – Thomas Browne Jul 18 '14 at 15:58
  • No, because the internal server is the one initiating the communication. Port forwarding is only required when an external computer initiates the conversation. – Nitz Jul 18 '14 at 16:17
2

One possibility is to set up an SSH tunnel that would be used for your Mongo database connections. The SSH tunnel would encrypt the Mongo traffic, and SSH tunnels are a well known technique.

See:

Jon Onstott
  • 151
  • 1
  • 8
  • I agree an SSH tunnel is the way to go here. Keep in mind that using an SSH tunnel will probably get the poster fired if anyone ever figures out what they're up to. – chris Jul 10 '14 at 15:58
  • So I'd be ssh-ing out of the office to an external fixed IP address, right? The external server does not need to have direct IP access to the internal server? It's all tunnelled over what? HTTP? btw I'm not stealing data, it's just our service provider wants to charge a grand a month for a rented server in their DMZ which I don't want to pay. I am assuming that suggested SSH streaming would not have an impact on internal net security as the firewall would still be fully operational? – Thomas Browne Jul 10 '14 at 16:11
  • I think the SSH tunnel would be from the outside coming in to your network over an open port, but it would be secure because it is SSH. – Jon Onstott Jul 10 '14 at 18:17
  • Yes that's the problem. I don't have any open incoming ports. I need a solution that breaks out of the firewall from inside the firm. Fixed external IP only. No pre-determined IP/port forwarded path in. – Thomas Browne Jul 10 '14 at 19:49
  • You would `ssh` out from your monogo instance to the external server to create the tunnel. You can then forward a remote locally accessible TCP port back over the SSH channel to back to your client machines mongo port. When you access `localhost:27017` on the external host, the data ends up on the mongo box. You'd need to run any of this past IT/Security as you are technically punching a hole through the firewall into your network for your external host, just doing it a different way. – Matt Jul 16 '14 at 11:46
1

There are good suggestions on tunnels already (of which I'd favour OpenVPN), but here I'll suggest a different approach.

Rather than exposing the whole of your mongodb data to the remote server in order to get a specific data set out, you might be better to build a more tightly focussed web API, which could run at either location.

  1. You could run a script or daemon on your mongodb server machine, or close to it (and within the firewall) which accesses mongodb natively, and then packages the data up and pushes it to an API on your remote web server. Probably REST and JSON are the sort of things you'd look at for designing the API on your web server.
  2. Alternatively you could have a web server close to the mongodb server, which is accessible from your remote web server. You'd implement a suitable API on the web server that's local to your mongodb server, and pull data from that by calling it from your remote web server.

Which approach works better depends mostly on the availability of somewhere suitable to run stuff inside your company network, and the network policies involved. You can lock down access to your API pretty tightly to satisfy concerns of your network managers. Eg lock access down by IP, require suitable authentication, and perhaps lock it down to a specific SSL key.

If possible, the most efficient approach is probably going to be running a daemon close to the mongodb server which uses the tailable cursors you mention, and sends the data to an API on your remote web server.

mc0e
  • 5,786
  • 17
  • 31
  • No ports in the "outer ring" firewall are forwardable, therefore the web server near the mongodb instance will not work either. However I like the idea of a "push" system, where I'll have the external server visible internally, and push to it on a regular basis. I think that's the only way forward. – Thomas Browne Jul 18 '14 at 10:33
  • 1
    Building new high performance forwarding systems seems to me like a bad idea. As someone that didn't spend a lot of time learning the in-and-outs of tcp/ip, neither schemes for compression / delta forwarding, you'll probably end with worse performance than a "professional" solution for it. – Nitz Jul 18 '14 at 11:01
  • @Nitz Some care probably is needed in how the data extraction and insertion is done at either end of the link, but 1000 data points every second is not especially heavy duty, and certainly not enough to worry much about the efficiency of the data transport. – mc0e Jul 20 '14 at 12:58