0

I'm setting up Istio in a new AWS EKS cluster and created a basic nginx deployment to test. When the deployment only has one replica, it works perfectly, responding in less than 100ms. When I add one replica, the new pod's response time goes up like crazy, averaging around 10 seconds.

Based on suggestions from elsewhere, I updated the mesh config to disable automatic retries:

meshConfig:
   defaultHttpRetryPolicy: {}

After this happened, I found that requests to the second pod are always failing:

"GET / HTTP/1.1" 503 UF upstream_reset_before_response_started{connection_failure} - "-" 0 91 10003 - "108.249.9.111,10.1.0.117" "curl/7.68.0" "6fa51be8-1441-4454-8d 1b-a03c93b257dc" "example.com" "10.1.52.62:80" outbound|80||nginx.my-namespace.svc.cluster.local - 10.1.108.189:8080 10.1.0.117:21410 - -

My setup is the following:

# AWS ALB Ingress -> istio-ingressgateway (ClusterIP) -> gateway -> virtualservice -> service -> nginx

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: default
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"

---

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: nginx
spec:
  hosts:
  - "example.com"
  gateways:
  - default
  http:
  - route:
    - destination:
        host: nginx

---

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  selector:
    app: nginx
  ports:
    - port: 80
      name: http

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
    version: v1
spec:
  replicas: 2
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      app: nginx
      version: v1
  template:
    metadata:
      labels:
        app: nginx
        version: v1
    spec:
      containers:
        - name: nginx
          image: nginx:latest
          ports:
            - containerPort: 80
          resources:
            requests:
              memory: 100Mi
              cpu: 100m
            limits:
              memory: 1500Mi
              cpu: 1000m

Versions:

$ istioctl version
client version: 1.13.2
control plane version: 1.13.2
data plane version: 1.13.2 (1 proxies)


$ kubectl version --short
Client Version: v1.21.11
Server Version: v1.21.5-eks-bc4871b
kenske
  • 111
  • 2

1 Answers1

0

I figured out that my problem was due to misconfigured security group rules for the nodes. I was not allowing node to node traffic, preventing the istio ingress gateway from communicating with pods in other nodes.

Using the AWS EKS Terraform module, I added the following:


  node_security_group_additional_rules = {

    ingress_self_all = {
      description = "Node to node all ports/protocols"
      protocol    = "-1"
      from_port   = 0
      to_port     = 0
      type        = "ingress"
      self        = true
    }
    egress_all = {
      description = "Node all egress"
      protocol    = "-1"
      from_port   = 0
      to_port     = 0
      type        = "egress"
      cidr_blocks = ["0.0.0.0/0"]
    }
  }
kenske
  • 111
  • 2