Apache Spark cluster through a proxy?

2

This is more a speculative question to those with more networking / Apache Spark experience than me.

My current setup: two 32GB, 4GHz Core i7 machines I'd love to tie into a Spark cluster. However, they are separated by a firewall: one is my home desktop (I can very easily set up the necessary port forwarding), and the other is my work desktop (behind a department firewall; inaccessible unless I'm already on the network).

There is a 3rd machine: it sits more or less on the DMZ of my work network, and thus I can SSH into it from outside, and from there I have direct access to my work desktop. Is there any possible way to run a port-forwarding setup through this machine to make it act as a transparent Spark proxy for the other two?

Magsol

Posted 2015-02-01T18:49:23.097

Reputation: 167

Answers

0

You can set up the 3rd machine as being the master of your SPARK cluster, make sure it can communicate through ssh with the other nodes, and use spark-submit to launch applications on it.

You will have to submit your application in cluster mode, for the driver to be run on the 3rd machine. Once the other nodes are registered as workers on the master, they will start receiving tasks.

Here your only problem will be communication between the 3 nodes (especially the work desktop), not how to launch an application on the machine you'll decide being the master of your cluster.

Bacon

Posted 2015-02-01T18:49:23.097

Reputation: 101