YARN 3 and Spark: allocate a GPU

As Spark doesn't like much YARN resources as of hadoop 3.0.0 (Spark is said to work with Hadoop 2.6+ but it implicitly means "up to 3.0 excluded"), a workaround was to set yarn.resource-types.yarn.io/gpu.minimum-allocation to 1, and from within my python code, cancel the executor order (spark doesn't launch the AM with 0 executor asked from command line)

sc = SparkContext(conf=SparkConf().setAppName("GPU on AM only").set("spark.executor.instances", 0))

Ugly but sufficient for our current workloads, hoping for a "Spark for Hadoop 3.0+" distribution soon enough.

EDIT: You can compile Spark for Hadoop 3.1 profile, from the current state of their github repository, then you have access to the spark.yarn..resource.yarn.io/gpu properties !

I'll share my findings about isolation here too:

After about 2 weeks of various tries we finally settled on a full wipe of every host for a clean install from scratch. Still nothing working. Then we tried a "one worker" setup to set a countable resource manually to try the allocation mechanism and then.... NOTHING hortonWORKS ! But my Googling was better suited then. It seems to be a Hadoop related issue about custom resources and CapacityScheduler, enjoy:

https://issues.apache.org/jira/browse/YARN-9161 https://issues.apache.org/jira/browse/YARN-9205

For now (3.1.1/3.2.0) the capacity.CapacityScheduler is broken by a hardcoded enum containing only vCores and RAM parameters. You just have to switch your scheduler class to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler You also want to replace "capacity" by "Fair" in the line yarn.scheduler.fair.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

Your GPUs will not be visible on yarn ui2 but will still be on the NodeManagers, and most importantly, will be allocated properly. It was a mess to find out indeed.

Kévin Azoulay

Posted 2018-12-24T16:35:55.937

Reputation: 11

YARN 3 and Spark: allocate a GPU

Answers