3

We have an API in AWS with a GPU instance that does inference. We have an auto-scaler setup with the minimum and maximum number of instances, but aren’t sure which metric (GPU/CPU usage, RAM usage, average latency, etc) or combination of metrics should be used to determine when a new instance needs to be launched to keep up with incoming requests.

Are there best practices in regards to what metrics should be used in this scenario? Inference in our case is very GPU intensive.

elwray14
  • 31
  • 1

1 Answers1

0

Amazon CloudWatch Agent adds Support for NVIDIA GPU Metrics

https://aws.amazon.com/about-aws/whats-new/2022/02/amazon-cloudwatch-agent-nvidia-metrics/

Amazon CloudWatch agent now supports the collection of NVIDIA GPU performance metrics from Amazon Elastic Compute Cloud (Amazon EC2) accelerated computing instances running Linux. GPU-based instances provide access to NVIDIA GPUs with thousands of compute cores. You can use these instances to accelerate scientific, engineering, and rendering applications. Customers can install and configure CloudWatch agent to collect system and application metrics from Amazon EC2, on-premises hosts, and containerized applications and send them to CloudWatch. CloudWatch provides you with data and actionable insights to monitor your applications and optimize resource utilization. GPU metrics are intended for users who want to monitor the utilization of GPU co-processors in their EC2 accelerated instances.

So based on these metrics, you'd probably want to monitor:

  1. nvidia_smi_utilization_gpu The percentage of time over the past sample period during which one or more kernals on the GPU was running.
  2. nvidia_smi_utilization_memory The percentage of time over the past sample period during which global (device) memory was being read or written.
user3041539
  • 101
  • 1