Horizontal autoscaling rails on GKE - specifically - web server choices and differences with conventional deployments

Question

Here's a great writeup on scaling with heroku (and it applies to traditional deployments).

Given that we want containerized applications to be single process, how do we get:

slow client protection
slow response protection

in a Kubernetes/GKE environment that takes full advantage of horizontal pod autoscaling?

Assume my deployment looks much like the following (credit @nithinmallya4):

I have not yet selected a web server, and by default rackup is serving WEBrick. I was considering just changing this to multi-threaded Puma.

My concern is that the autoscaler works based on CPU, not based on an idea that it is consumed by a current http/s request, so it may not come into play.

Am I understanding the autoscaler correctly?
What is the ideal scale up/down architecture?

Our current thoughts:

nginx in a pod sidecar pattern (with a gzip deflater) behind an Ingress.
puma in front of rails (in the same image as rails-api), assuming that it would better utilize cpu and trigger autoscale
custom metrics for HPA (still need to research this with 1.8)

[HPA](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) is based on the CPU utilization. You can enable [Cluster Autoscaling on GKE](https://cloud.google.com/container-engine/docs/cluster-autoscaler)(currenlty in beta) to adjusts the size of a Kubernetes cluster based on the workload. For more detailed information, check the Cluster autoscaler [FAQ on github](https://github.com/kubernetes/autoscaler/blob/cluster-autoscaler-release-0.6/cluster-autoscaler/FAQ.md). — N Singh, Sep 21 '17 at 18:40

score 2 · Accepted Answer · edited Mar 17 '18 at 09:20

On GKE we have been supporting HPA with custom metrics since version 1.9. If you have a group of horizontally autoscaled pods inside your cluster each exporting a custom metric then you can set an average per pod target for that metric.

An example of that would be an autoscaled deployment of a frontend where each replica exports its current QPS. One could set the average target of QPS per frontend pod and use the HPA to scale the deployment up and down accordingly. You can find the documentation and a tutorial explaining how to set this up here: https://cloud.google.com/kubernetes-engine/docs/tutorials/custom-metrics-autoscaling

Kubernetes 1.10 becoming available on GKE will extend the support for custom metrics to include metrics not attached to any Kubernetes object. This will give you the ability to scale a deployment based on any metric listed here, for example number of messages in Google Pub/Sub queue.

score 0 · Answer 2 · answered Sep 28 '17 at 20:42

0

HPA is based on the CPU utilization. You can enable Cluster Autoscaling on GKE(currenlty in beta) to adjusts the size of a Kubernetes cluster based on the workload. For more detailed information, check the Cluster autoscaler FAQ on github.

answered Sep 28 '17 at 20:42

N Singh

438
3
10

Thanks, but that doesn't solve my concern and omits the new features. I see from the notes in 1.8 that custom metrics are supported, and they even mention queue size, which I think refers to nginx http queuing, so it looks like that route _will_ solve my concern. – kross Sep 28 '17 at 23:49
Were you able to solve this issue using custom metrics? It seems to be [available now](https://medium.com/@marko.luksa/kubernetes-autoscaling-based-on-custom-metrics-without-using-a-host-port-b783ed6241ac) – Carlos Jan 23 '18 at 19:21

Horizontal autoscaling rails on GKE - specifically - web server choices and differences with conventional deployments

Our current thoughts:

2 Answers2