How to configure Spark client running in a Docker container for two-way communication with a remote Spark cluster?

Question

spark-submit seems to require two-way communication with a remote Spark cluster in order to run jobs.

This is easy to configure between machines (10.x.x.x to 10.x.x.x and back) but becomes confusing when Docker adds an extra layer of networking (172.x.x.x through 10.x.x.x to 10.x.x.x and then back to 172.x.x.x through 10.x.x.x somehow).

Spark adds an extra layer of complexity with its SPARK_LOCAL_IP and SPARK_LOCAL_HOSTNAME configuration parameters for the client.

How should Docker networking be configured to allow this?

Assuming you have DNS records and the Docker host is on the same network as the Spark cluster, SPARK_LOCAL_HOSTNAME should be set to the fully qualified domain name of the Docker host and the relevant container ports should be exposed on the host. — Leo, Jan 24 '17 at 17:31

score 0 · Answer 1 · answered Jan 21 '19 at 07:53

You can run the docker containers with host network mode. In your compose file you can add the following config:

services:
  worker0:
    container_name: container0
    cpuset: 0-4
    entrypoint: /entrypoint.sh
    environment:
        - SPARK_MASTER=10.34.221.247
        - RAM=16g
        - NUM_WORKERS=5
        - SHUFFLE_PORT=7338
    expose:
        - 7000-64000
    image: 10.47.7.214/spark-worker
    mem_limit: 16g
    network_mode: bridge
    tty: true

Though I am facing issues with this config still. The jobs starts normally but eventually the docker driver fails to connect to the executors. You can atleast try this for now.

How to configure Spark client running in a Docker container for two-way communication with a remote Spark cluster?

1 Answers1