4

I'm trying to achieve a zero downtime deployment using kubernetes and during my test the service doesn't load balance well.

My kubernetes manifest is:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myapp-deployment
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  template:
    metadata:
      labels:
        app: myapp
        version: "0.2"
    spec:
      containers:
      - name: myapp-container
        image: gcr.io/google-samples/hello-app:1.0
        imagePullPolicy: Always
        ports:
          - containerPort: 8080
            protocol: TCP
        readinessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          successThreshold: 1

---

apiVersion: v1
kind: Service
metadata:
  name: myapp-lb
  labels:
    app: myapp
spec:
  type: LoadBalancer
  externalTrafficPolicy: Local
  ports:
    - port: 80
      targetPort: 8080
  selector:
    app: myapp

If I loop over the service with the external IP, let's say:

$ kubectl get services
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)        AGE
kubernetes   ClusterIP      10.35.240.1    <none>           443/TCP        1h
myapp-lb     LoadBalancer   10.35.252.91   35.205.100.174   80:30549/TCP   22m

using the bash script:

while True
    do
        curl 35.205.100.174 
        sleep 0.2s
    done

I receive some connection refused during the deployment:

curl: (7) Failed to connect to 35.205.100.174 port 80: Connection refused

The application is the default helloapp provided by Google Cloud Platform and running on 8080.

Cluster information:

  • Kubernetes version: 1.8.8
  • Google cloud platform
  • Machine type: g1-small
thoas
  • 41
  • 1
  • 2
  • how frequently are you getting those connection refused? I'm trying right now the same deployment as you and removed the sleep to stress test the service and right now I'm at around 2000 requests and 0 fails. – DevopsTux May 18 '18 at 10:35
  • 0 errors on 20.000 requests now. – DevopsTux May 18 '18 at 11:52
  • 1
    it occurs during a deployment only, try changing the `version` and restart the script. If I siege the service internal ip or external ip I get some `connection refused` – thoas May 19 '18 at 07:33
  • An example during a deployment: https://cl.ly/3l2E3f3F2q1T – thoas May 19 '18 at 19:07
  • where are you launching the siege from exactly? – DevopsTux May 22 '18 at 07:25
  • the siege is launched locally and also tested un a busybox directly in the cluster using the Cluster IP – thoas May 22 '18 at 08:22

2 Answers2

1

I got the same problem and tried to dig a bit deeper in the GKE network setup for this kind of LoadBalancing.

My suspicion is that the iptables rules on the node that runs the container are updated to early. I increased the timeouts a bit in your example to better find the stage in where the requests are getting timeouts.

My changes on your deployment:

spec:
...
  replicas: 1         # easier to track the state of the system
  minReadySeconds: 30 # give the load-balancer time to pick up the new node
...
  template:
    spec:
      containers:
        command: ["sh", "-c", "./hello-app"] # ignore SIGTERM and keep serving requests for 30s

Everything works well until the old pod switches from state Running to Terminating. I tested with a kubectl port-forward on the terminating pod and my requests were served without timeouts.

The following things happens during the change from Running to Terminating:

  • Pod-IP is removed from the service
  • Health check on the node returns 503 with "localEndpoints": 0
  • iptables rules are changed an that node and traffic for this service is dropped (--comment "default/myapp-lb: has no local endpoints" -j KUBE-MARK-DROP

The default settings of the load-balancer checks every 2 seconds and needs 5 failures to remove the node. This means for at least 10 seconds the packets are dropped. After I changed the interval to 1 and only switch after 1 failure the amount of dropped packages decreased.

If you are not interested in the source IP of the client, you could remove the line:

externalTrafficPolicy: Local

in your service definition and the deployments are without connection timeouts.

Tested on GKE Cluster with 4 nodes and version v1.9.7-gke.1.

  • same issue with the `minReadySeconds`, I get a `curl: (56) Recv failure: Connection reset by peer` during a deployment – thoas Jun 05 '18 at 11:56
0

Looking at the screenshot that you shared in the comments what you are running into is not a case of your k8s cluster failing to accept and reply correctly to the HTTP GET / request, but a problem with siege and how it works. I've run into this a couple times myself.

See this github issue for reference: https://github.com/JoeDog/siege/issues/127

The issue is that by default siege closes each connection leaving the port in a TIME_WAIT state which means it can't be reused for a while. Your server just runs out of available ports.

Basically, when you have used all the available ephemeral ports. You can check the available port range with:

sysctl net.ipv4.ip_local_port_range

And how long it takes them to move from TIME_WAIT to CLOSE with:

sysctl net.ipv4.tcp_fin_timeout

On the linux desktop I am using at this moment these are the values:

sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768    60999
sysctl net.ipv4.tcp_fin_timeout
net.ipv4.tcp_fin_timeout = 60

This means that it can't use more than 28231 sockets (range available between 32768 and 60999) in less than 60 seconds. After 60 seconds, the ones that have reached that amount of time the system will wait from the moment a TCP connection is terminated to actually release the socket so it can be used for new connections:

tcp_fin_timeout

The length of time in seconds it takes to receive a final FIN before the socket is always closed. This is strictly a violation of the TCP specification, but required to prevent denial-of-service attacks. http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html

This is whay you re seing an intermitent error instead of just reaching to poing where siege stops forming connections at all.

If your interest is to stress test your deployment harder that that and considering you are launching the test from a test instance that won't be used in prodction you can simply reduce that value temporarily to something lower:

sysctl net.ipv4.tcp_fin_timeout=30

And also maxing out the ephemeral prot range:

sudo sysctl -w net.ipv4.ip_local_port_range="1024 65535"

This will change these values until the end of your session and will be back to the defaults once you restart the service.

If you want to make the change permanent you can overwrite the corresponding values in /proc/ :

echo "your new port range" > /proc/sys/net/ipv4/ip_local_port_range
echo "your new timeout" > /proc/sys/net/ipv4/tcp_fin_timeout

There is actually more complexity to all this, but it this should be enough to hold your test for a bit longer, at least.

Also, if you want to check your socket statistics and states on some distributions the classic netstat will not be there anymore. In that case you can use ss like this to check the sockets on TIME-WAIT:

 ss  state time-wait
DevopsTux
  • 158
  • 7
  • thank you for your answer, `siege` is not the issue here since we have tested on multiple servers and even with a dead simple `curl` loop. – thoas Jun 03 '18 at 16:37
  • What siege does is, in fact, is a bit like a curl loop. You will end up running out of sockets with either, Kubernetes is not the problem here. Did you try my answer? – DevopsTux Jun 06 '18 at 07:09
  • yes I tried your answer, it's not related to the HTTP client (we are also testing it in pur python with only one connection) and the ingress is returning some `502` status code. – thoas Jun 06 '18 at 14:20
  • Is it possible you are getting this error while the public IP for the load balancer is provisioned? – DevopsTux Oct 04 '18 at 15:54