1

I created an Azure Kubernetes cluster using Terraform. I used the recommanded azurerm_kubernetes_cluster resource, which create the nodes under the hood. As a consequence, I don't have access to the nodes as Kubernetes object. I now want to Terraform the alerting related to thoses nodes. But in order to do it, I need to have the nodes as a Terraform object (Terraform alert rule resource requires the id of the node: https://www.terraform.io/docs/providers/azurerm/r/monitor_metric_alertrule.html).

So I tried to access the nodes as Terraform data: https://www.terraform.io/docs/providers/azurerm/d/virtual_machine.html.

As stated in the previous link, we need two infos for that: the resource_group and the name of the virtual machine. The output of the azurerm_kubernetes_cluster resource gives us the resource_group, so that part is ok. But the nodes have a name generated randomly (to be more precised, one part of the name is generated randomly, the other part can be guessed from Terraform objects that we have). But as the previous link shows, there is no possibility to use a filter functionnality (such as in https://www.terraform.io/docs/providers/aws/d/ami.html) or to use a regex to match the nodes names. So the following is not possible (with the * in place of the random generated part and where part1 and part2 are known):

data "azurerm_virtual_machine" "nodes" {
  name                = "part1-*-part2"  
  resource_group_name = "${azurerm_kubernetes_cluster.this.node_resource_group}"
}

Does anyone have an id on how to unlock one of the following:

  • I can't find any explanation in the Azure AKS documentation on how the random generate part of the node name is generated (is it that random are can it be predicted?) and can't find myself by experimentation or wild guess. Does someone knows that?
  • Can we get the a list of virtual machines of a resource_group using datas in a way I didn't think of yet?
  • I can't seem to find any blog post or videos where AKS nodes alerting is done using Terraform, even with dirty tricks. Can someone provide me a link I missed to work on?

Terraform Azure provider version: 1.23.0

Terraform version: 0.10.x (required by the Azure provider 1.23.0)

dbourcet
  • 175
  • 1
  • 2
  • 10

2 Answers2

2

You're coming at this the wrong way. When you create an AKS cluster, you are creating some VMs as worker nodes, however, these are not plain old VMs, you can't manage them like standalone VMs, the AKS cluster is undertaking most of the management work.

If you want to monitor the VMs, you need to do so through the AKS cluster, using AKS metrics, which include node metrics, not as standalone VMS. You can see more details of AKS metrics here - https://docs.microsoft.com/en-us/azure/azure-monitor/insights/container-insights-overview

The alternative approach recommended by MS is to have whatever monitoring tool or similar you want to run, running in a container its self. You can then have this run as a Daemonset on AKS so that it runs on every node. This is how the Azure Monitor collector works.

Sam Cogan
  • 38,158
  • 6
  • 77
  • 113
  • I indeed set up the native monitoring with Terraform and I am able to see all the relevant metrics in the AKS metrics dashboard (CPU, memory ,etc, for nodes and containers), as you suggest. The problem I am facing initially is about alerting: there are a very few metrics about AKS cluster that can be exploited to build alerts (see https://docs.microsoft.com/en-us/azure/azure-monitor/platform/metrics-supported#microsoftcontainerservicemanagedclusters). So I wanted to monitor nodes as basic virtual machines, which can allow me to access to a lot more metrics to build my alerts. – dbourcet Apr 03 '19 at 06:51
  • Bottom line is, you can't monitor them like normal VMs, so you would need to use the metrics for the cluster, or implement your own monitoring using Prometheus etc. – Sam Cogan Apr 03 '19 at 14:50
  • You are right. We choose for now to push metrics in our monitoring solution, which then handle the alerting, waiting for Azure to propose a lot more alerts. Do you know if using custom alerts in Azure would do the tricks, have you already defined custom alerts for Azure k8s cluster? – dbourcet Apr 04 '19 at 06:56
  • Actually, this isn't good enough for all purposes. For instance, I want to install a network connection monitor to monitor connectivity between my cluster and resources that it depends on. I can, indeed, configure this in the portal, but I can't configure it via Terraform without the nodes' VM IDs. As near as I can discover, there is no cluster-level connection monitor, either. – Derek Jul 22 '19 at 13:16
  • @Derek that doesn't change that you can't install things directly on the nodes. The alternative solution recommended by MS is to have these monitors running in containers in your cluster, usually as Daemonsets so they can run on each node. – Sam Cogan Aug 11 '19 at 15:42
  • @SamCogan I cannot agree with you. You can do whatever you want to with this VM. This is not well documented but possible. In my case I add by terraform extra strorage for each of my worker storage, as well I can ssh into it and do whatever I want to. – MrHetii Nov 18 '19 at 20:40
  • @MrHetii yes you can do what you want with it, and as soon as you do a cluster update, or you scale the number of nodes, app your changes will be wiped away. Just because you can do something, doesn’t mean you should. It’s well documented that you don’t want to be making changes to these VMs – Sam Cogan Nov 18 '19 at 20:42
0

It is possible to get all node names used by aks cluster by investigating subnet attached to it:

data "azurerm_subnet" "aks" {
    name = azurerm_subnet.subnet.name                             # "aks-subnet-dev"
    virtual_network_name = azurerm_virtual_network.network.name   # "aks-vnet-dev"
    # aks-cluster-dev
    resource_group_name  = azurerm_kubernetes_cluster.cluster.resource_group_name
}

Below code use cluster resource instead:

data "azurerm_subnet" "aks" {
    name = element(split("/", azurerm_kubernetes_cluster.cluster.agent_pool_profile[0].vnet_subnet_id), 10)
    virtual_network_name = element(split("/", azurerm_kubernetes_cluster.cluster.agent_pool_profile[0].vnet_subnet_id), 8)
    resource_group_name  = azurerm_kubernetes_cluster.cluster.resource_group_name                                           
}

Finally you can get your nodes from such output:

output "aks_nodes" {
    value = distinct([for x in data.azurerm_subnet.aks.ip_configurations :   replace(element(split("/", x), 8), "/nic-/", "")])
}

Result:

terraform apply:
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

aks_nodes = [
  "aks-aks-35064144-0",
  "aks-aks-35064144-1",
  "aks-aks-35064144-2",
]

kubectl get node:
NAME                 STATUS   ROLES   AGE    VERSION
aks-aks-35064144-0   Ready    agent   4d2h   v1.15.4
aks-aks-35064144-1   Ready    agent   4d2h   v1.15.4
aks-aks-35064144-2   Ready    agent   4d2h   v1.15.4
MrHetii
  • 101