1

We've seen some odd behaviour on our kubernetes cluster. We have two test applications that speak gRPC. One sends a subscription message and the other sends back a stream of responses. The envisaged behaviour is that this stream stays up until cancelled. However, we were finding situations in which the server thought it was sending updates but the client didn't receive them. Narrowing this down further led to a reproducible test case:

  • If the service is configured as gRPC in Kubernetes i.e. grpc- in the port name.
  • You set up the streaming
  • You then reboot the server

Then the observed behaviour was that the client never saw the connection drop, presumably because its connection is to the istio proxy and not the destination server. However, without the information that the connection has dropped, it can't re-establish the connection.

Has anyone seen this behaviour? What do we need to configure in ISTIO to work around this? (We can fix the problem simply by changing the service to have a port name that doesn't begin with "grpc-" but that works by disabling ISTIO's gRPC functionality.)

Edit:

Kubernetes: 1.14.6 Istio: 1.3.6

There's no explicit DestinationRule set up, although various things have been tried we couldn't find anything that changed this behaviour.

Julian Birch
  • 113
  • 6

1 Answers1

1

This could be prevented by idleTimeout setting in DestinationRule.

According to istio documentation about idleTimeout:

The idle timeout for upstream connection pool connections. The idle timeout is defined as the period in which there are no active requests. If not set, there is no idle timeout. When the idle timeout is reached the connection will be closed. Note that request based timeouts mean that HTTP/2 PINGs will not keep the connection alive. Applies to both HTTP1.1 and HTTP2 connections.

So If You make DestinationRule like this:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: grpc-iddletimeout-policy
spec:
  host: grpcservice.servicenamespace.svc.cluster.local
  trafficPolicy:
    connectionPool:
      http:
        idleTimeout: 2m

This should close the any HTTP/2 connection from Istio envoy proxy side after being idle for 2 minutes for grpcservice in servicenamespace namespace.


Istio does have tcpKeepalive as well but I'm not sure if it will work with grpc connection and Your configuration.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: grpc-iddletimeout-policy
spec:
  host: grpcservice.servicenamespace.svc.cluster.local
  trafficPolicy:
    connectionPool:
      tcp:
        connectTimeout: 30ms
        tcpKeepalive:
          time: 7200s
          interval: 75s
      http:
        idleTimeout: 2m

Note that the tcpKeepalive setting is applied at the TCP level while idleTimeout at HTTP/2 level.

You can check here to see what specific TCPKeepalive options are available.

There is also an article about using gRPC with connection keepalive.

Hope it helps.

Piotr Malec
  • 271
  • 1
  • 5
  • The difficulty with this is the connection deals with relatively infrequent events that need to be responded to quickly (relative to the spacing of the events), meaning either you set it too short and your connection times out all of the time, or you set it too long and don't receive events you should have until the idle expires. We *could* work round this by implementing a software heartbeating mechanism, but that seems completely against the spirit of what ISTIO is for to begin with. – Julian Birch Feb 24 '20 at 17:58
  • I have edited my answer, It might be possible to keep the connection up with `tcpKeepalive` option. – Piotr Malec Feb 25 '20 at 14:32