I am running a 6 node spark cluster on Google Data Proc and within few minutes of launching spark, and performing basic operations, I get the below error
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000fbe00000, 24641536, 0) failed; error='Cannot allocate memory' (errno=12)
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 24641536 bytes for committing reserved memory.
An error report file with more information is saved as:/home/chris/hs_err_pid21047.log
The only two commands I ran are the following
data = (
spark.read.format("text")
.option("header", "true")
.option("inferSchema", "true")
.load("gs://bucketpath/csv")
)
data.show()
The csv file is stored in the google storage bucket and the size of the file is 170 MB
Below is the details about my cluster configuration
Name cluster
Region australia-southeast1
Zone australia-southeast1-b
Master node
Machine type n1-highcpu-4 (4 vCPU, 3.60 GB memory)
Primary disk type pd-standard
Primary disk size 50 GB
Worker nodes 5
Machine type n1-highcpu-4 (4 vCPU, 3.60 GB memory)
Primary disk type pd-standard
Primary disk size 15 GB
Local SSDs 0
Preemptible worker nodes 0
Cloud Storage staging bucket dataproc-78f5e64b-a26d-4fe4-bcf9-e1b894db9d8f-au-southeast1
Subnetwork default
Network tags None
Internal IP only No
Image version 1.3.14-deb8
This looked like an issue with the memory, hence I Tried to change the machine type to n1-highcpu-8 (8 vCPU, 7.2 GB memory), however I am unable to launch the instances post that as I am getting the following error
Quota 'CPUS' exceeded. Limit: 24.0 in region australia-southeast1.
So I am not sure what should be done to resolve the issue. I am very new to Google Cloud Platform and I would really appreciate any help in order to resolve this. This for a super critical project