1

I have a 3-instance EMR cluster running on AWS, and it's responding very slowly at the moment.

When checking the Hadoop dashboard on port 8088 with my browser, I see "Memory used: 203.5GB", and "Memory available: 214GB". I assume the problem is there: All the RAM is currenly occupied.

How can I find out which application is running and hoarding all the RAM? Is there something like the top command for a cluster? When I SSH to the master node and check top and free -g, the output suggests that >50% of RAM is still available, and this contradicts the output from the port 8088 web report.

2 Answers2

1

Amazon already provide a web interface with statistics on your EMR cluster, just go to:

https://console.aws.amazon.com//elasticmapreduce/home

Choose the cluster link under Name to open a cluster details page for the cluster. Use each tab to view relevant information.

For instance you can find job details for a Spark application by going to Application history and then selecting the Application id and expanding the line. More details: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-cluster-application-history.html

Luca Gibelli
  • 2,611
  • 1
  • 21
  • 29
  • On there, I can see aggregated usage plots, but not which process claims how much RAM. Do you know if that's available there too? I couldn't find it. – Alexander Engelhardt May 24 '18 at 17:06
  • You can see individual stats for a specific job but not exactly the RAM used. Still it's useful info to debug problems. You can also see the logs for a specific job. Answer updated accordingly. – Luca Gibelli May 24 '18 at 23:41
0

First, some details on metrics:

The metrics "memory used" and "memory available" from YARN UI you mentioned indicates memory usage within YARN processes, not the hosts used by YARN's ResourceManagers.

Eg. you have 3 nodes in cluster, with more than 64 GB each (bc 64 * 3 < 214) say 128GB, however YARN is configured to use ~71 GB(214 / 3). (I suggest that figures are incorrect, but it's just an example). For each node all processes on it uses about 50% of RAM, however your application uses almost all RAM available for YARN on cluster.

Second: It's totally ok to use as much as possible of memory of the cluster, unless your cluster suits your needs and you do not plan to do more load without cluster reconfiguration. One needs only to monitor actual hosts metrics underneath, because running JVMs also needs free host RAM for overhead, offheap storage and so on.

Third, suggestions on your case:

seems like your nodes are not very efficiently used (loaded). In common, 80% usage is what you want to achieve for you infrastructure (RAM, CPU, etc). So, you could consider to move to lesser nodes but with a bit more of them. Smaller nodes would end in less amount of data, higher parallelism and probably speed-up of processing for less money.

gemelen
  • 101
  • 2