Graceful Connection Drain of istio-proxy
This document explains what happens when a pod which has
istio-proxy sidecar enabled is deleted, particularly how the connections are treated, and how smooth you can configure the sidecar to drain the inflight connections gracefully.
This document only applies to TSB version <=
Before you get started, make sure you:
✓ Familiarize yourself with TSB concepts
✓ Install the TSB environment. You can use TSB demo for quick install
✓ Completed TSB usage quickstart. This document assumes you already created Tenant and are familiar with Workspace and Config Groups. Also you need to configure tctl to your TSB environment
✓ Install httpbin
When you issue a delete request against a pod in your Kubernetes cluster, all containers within the pod are sent a SIGTERM. If the pod contains only a single container, it will receive a SIGTERM and go into the terminating state.
However, if the pod contains a sidecar (in our case an
istio-proxy sidecar), then it is not automatically guaranteed that the main application is terminated before the sidecar.
istio-proxy sidecar is terminated before the application, the following issues may occur:
- All TCP connections (both inbound and outbound) are terminated abruptly.
- Any connections from the application fail
While there is a proposed KEP for it, currently there is no straightforward way to tell Kubernetes to terminate the application before the sidecar.
However, it is possible to workaround this problem by configuring the
drainDuration parameter. This configuration parameter controls the amount of time that the underlying
envoy proxy drains inflight connections before fully terminating.
To take advantage of the
drainDuration parameter, you will need to configure it in both the container sidecars, and the TSB gateways.
drainDuration time for
You will need to apply an overlay to the
ControlPlane CR or Helm values to set
drainDuration. Consider the following example. Note that only applicable parts are shown – you will most likely need a lot more configuration for your control plane.
spec: ... components: istio: kubeSpec: overlays: - apiVersion: install.istio.io/v1alpha1 kind: IstioOperator name: tsb-istiocontrolplane patches: - path: spec.meshConfig.defaultConfig.drainDuration value: 50s ...
After adding the overlay to your configuration, use the
kubectl command to apply it to the
kubectl apply -f controlplane.yaml
If you use Helm, you can update
spec section of the control plane Helm values then do
You must restart of the workload with the
istio-proxy to get the
drainDuration in effect. Once you have restarted your workload, you can verify it by checking the config dump of the for
kubectl exec helloworld-v1-59fdd6b476-pjrtr -n helloworld -c istio-proxy -- pilot-agent request GET config_dump |grep -i drainDuration "drainDuration": "50s",
drainDuration for TSB gateways
If you are using TSB gateways such as
Tier1Gateway, you will need to configure your appropriate gateway type using the
You can query the current value for the
connectionDrainDuration field on your gateway custom resource by issuing the following command:
kubectl get ingress helloworld-gateway -n helloworld -oyaml | grep connectionDrainDuration: connectionDrainDuration: 22s
The following example shows how
connectionDrainDuration may be set. Please read the spec for further information on the this field.
apiVersion: install.tetrate.io/v1alpha1 kind: IngressGateway metadata: name: helloworld-gateway spec: connectionDrainDuration: 10s # ... <snip> ...
drainDuration in the TSB Gateway
To check the value for
drainDuration that is being set on the pod, you can query the environment variable:
kubectl describe po helloworld-gateway-7d5d4c8d57-msfd6 -n helloworld | grep -i DRAIN TERMINATION_DRAIN_DURATION_SECONDS: 22
You can also verify this value by looking at the logs for the gateway pod when you terminate the gateway. If you watch the logs as the gateway pod is terminated, you should see messages resembling the following:
2022-03-29T06:02:50.423789Z info Graceful termination period is 22s, starting... 2022-03-29T06:03:12.423988Z info Graceful termination period complete, terminating remaining proxies.