1

I have a GKE cluster which, for the sake of simplicity runs just Prometheus, monitoring each member node. Recently I recently upgraded the API server to 1.6 (which introduces RBAC), and had no issues. I then added a new node, running version 1.6 kubelet. Prometheus could not access the metrics API of this new node.

Prometheus targets page

So, I added a ClusterRole, ClusterRoleBinding and a ServiceAccount to my namespace, and configured the deployment to use the new ServiceAccount. I then deleted the pod for good measure:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default
secrets:
- name: prometheus-token-xxxxx

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: prometheus-prometheus
    component: server
    release: prometheus
  name: prometheus-server
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-prometheus
      component: server
      release: prometheus
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: prometheus-prometheus
        component: server
        release: prometheus
    spec:
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      serviceAccount: prometheus
      serviceAccountName: prometheus
      ...

But the situation remains unchanged.

The metrics endpoint returns HTTP/1.1 401 Unauthorized, and when I modify the Deployment to include another container with bash + curl installed and make the request manually, I get:

# curl -vsSk -H "Authorization: Bearer $(</var/run/secrets/kubernetes.io/serviceaccount/token)" https://$NODE_IP:10250/metrics
*   Trying $NODE_IP...
* Connected to $NODE_IP ($NODE_IP) port 10250 (#0)
* found XXX certificates in /etc/ssl/certs/ca-certificates.crt
* found XXX certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*    server certificate verification SKIPPED
*    server certificate status verification SKIPPED
*    common name: node-running-kubelet-1-6@000000000 (does not match '$NODE_IP')
*    server certificate expiration date OK
*    server certificate activation date OK
*    certificate public key: RSA
*    certificate version: #3
*    subject: CN=node-running-kubelet-1-6@000000000
*    start date: Fri, 07 Apr 2017 22:00:00 GMT
*    expire date: Sat, 07 Apr 2018 22:00:00 GMT
*    issuer: CN=node-running-kubelet-1-6@000000000
*    compression: NULL
* ALPN, server accepted to use http/1.1
> GET /metrics HTTP/1.1
> Host: $NODE_IP:10250
> User-Agent: curl/7.47.0
> Accept: */*
> Authorization: Bearer **censored**
>
< HTTP/1.1 401 Unauthorized
< Date: Mon, 10 Apr 2017 20:04:20 GMT
< Content-Length: 12
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host $NODE_IP left intact
  • Why doesn't that token allow me to access that resource?
  • How does one check the access granted to a ServiceAccount?
pnovotnak
  • 260
  • 4
  • 11

2 Answers2

1

I run into the same issue and created ticket https://github.com/prometheus/prometheus/issues/2606 for this and out of it's discussion updated the configuration examples via PR https://github.com/prometheus/prometheus/pull/2641.

You can see the updated relabeling for the kubernetes-nodes job at https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L76-L84

Copied for reference:

  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - target_label: __address__
    replacement: kubernetes.default.svc:443
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __metrics_path__
    replacement: /api/v1/nodes/${1}/proxy/metrics

For RBAC itself you need to run Prometheus with it's own service account which you create with

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default

Make sure to pass that service account into the pod with the following pod spec:

spec:
  serviceAccount: prometheus

And then the Kubernetes manifests for setting up the appropriate RBAC role and binding to give the prometheus service account access to the required API endpoints at https://github.com/prometheus/prometheus/blob/master/documentation/examples/rbac-setup.yml

Copied for reference

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

Replace the namespace in all manifests to correspond to the one you run Prometheus in and then apply the manifest with an account with Cluster Admin rights.

I haven't tested this in a cluster without ABAC fallback, so the RBAC role might still be missing something essential.

  • Thanks for all your work! If you copy-paste the meat of my answer into yours I'll delete mine and mark yours as the solution so you can have the cred! – pnovotnak May 09 '17 at 21:10
0

As per discussion on @JorritSalverda's ticket; https://github.com/prometheus/prometheus/issues/2606#issuecomment-294869099

Since GKE doesn't allow you to get to client certificates that would allow you to authenticate yourself with the kubelet, the best solution for users on GKE seems to use the kubernetes API server as a proxy requests to nodes.

To do this (quoting @JorritSalverda);

"For my Prometheus server running inside GKE I now have it running with the following relabeling:

relabel_configs:
- action: labelmap
  regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
  replacement: kubernetes.default.svc.cluster.local:443
- target_label: __scheme__
  replacement: https
- source_labels: [__meta_kubernetes_node_name]
  regex: (.+)
  target_label: __metrics_path__
  replacement: /api/v1/nodes/${1}/proxy/metrics

And the following ClusterRole bound to the service account used by Prometheus:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]

Because the GKE cluster still has an ABAC fallback in case RBAC fails I'm not 100% sure yet this covers all required permissions.

pnovotnak
  • 260
  • 4
  • 11