1
I can't find the working Spark option to require executors with a GPU.
I'm trying to setup a HADOOP cluster in order to run Machine Learning algorithms on available GPUs via Spark.
So far I'm trying out my setup with a minimal cluster (1 resource manager and 2 node managers (each with 8cores, 32Gb RAM, 1 Nvidia GPU), everybody running Ubuntu 18.04.
Resource discovery is working as expected (I see my 16 cores, 56Gb memory and 2 yarn.io/gpu)
The documentation provides a way, by using "--conf spark.yarn.executor.resource.yarn.io/gpu=1" but this does not work for me (no effect at all, both in spark-submit command parameter or in $SPARK_CONF/metrics.properties).
As YARN 3 is the first one to provide GPU isolation, I try to avoid a rollback to an older(/more documented) version.
I guess this could be set in code through SparkContext and would be happy to know how, but as I'm more on the admin side than ML engineer, I rather set this in conf files once and for all. Anyway at this point, any solution would be appreciated.
Anyone happy to provide the good syntax to allocate GPU with resources isolation enabled ?
Love you guys, Kevin
(Yarn 3.1.1/3.2.0 on HortonWorks HDP)