0

I can't get "multi-primary multi-network" to play nice with locality failover (or locality load balancing for that matter). The endpoints are registered fine. The istio-system is labeled with network information, and each node is labeled with zone and region information and when I check the /clusters page on the client's envoy admin interface, the zone and region information is set correctly for each endpoint.

The issue seems to be that the control plane isn't assigning priority to the endpoints. However, to a stale source, this should work automatically, provided that I've created a DestinationRule (which I have). I've also crated a VirtualService for good measure.

$ istioctl proxy-config endpoints -n client client-6889f68cbc-z5jb6 --cluster "outbound|80||server.server.svc.cluster.local" -o json | jq '.[0].hostStatuses[] | del(.stats)'
{
  "address": {
    "socketAddress": {
      "address": "10.244.1.25",
      "portValue": 80
    }
  },
  "healthStatus": {
    "edsHealthStatus": "HEALTHY"
  },
  "weight": 1,
  "locality": {
    "region": "region2",
    "zone": "zone2"
  }
}
{
  "address": {
    "socketAddress": {
      "address": "172.18.254.1",
      "portValue": 15443
    }
  },
  "healthStatus": {
    "edsHealthStatus": "HEALTHY"
  },
  "weight": 3,
  "locality": {
    "region": "region1",
    "zone": "zone1"
  }
}

My setup is two 1.20.2 clusters running locally using KinD + metallb, with Istio operator v1.9.1. Each cluster is configured to occupy a different region & zone.

Istio VS and DR

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: server
  namespace: server
spec:
  host: server
  trafficPolicy:
    connectionPool:
      http:
        http2MaxRequests: 10
        maxRequestsPerConnection: 10
    loadBalancer:
      localityLbSetting:
        enabled: true
      simple: ROUND_ROBIN
    outlierDetection:
      baseEjectionTime: 1m
      consecutive5xxErrors: 1
      interval: 1s
      maxEjectionPercent: 51
      minHealthPercent: 0
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: server
  namespace: server
spec:
  hosts:
  - server
  http:
  - route:
    - destination:
        host: server

Kiali View

kiali view

As you can see from the Kiali dashboard, the DR and VS are both active. Both clusters are routable. But traffic is flowing to both equally, where it ought to be flowing only to one. I've also tried specifying distribute and failover explicitly in my DR spec with no success.

pnovotnak
  • 260
  • 4
  • 11

1 Answers1

1

This is a bug in istio 1.9.1 when running in a bare-metal environment. The client must have a service attached to it. When a service is provided, the locality is pulled from the first instance. However, when there is no service defined, the cloud metadata provider is used to assign locality to the proxy instances (the sidecar itself queries the metadata server).

See:

https://github.com/istio/istio/blob/bf5dd51386f4d78b20dd1f9c14f09b562a6ecd6e/pilot/pkg/xds/ads.go#L584-L600

pnovotnak
  • 260
  • 4
  • 11