虚拟化之K8S平滑更新(zero-downtime with Kubernetes)

服务如何优雅退出

/*
	客户端会报如下错误
	 Get Get http://127.0.0.1:9991/ping: read tcp 127.0.0.1:61733->127.0.0.1:9991: wsarecv: An existing connection was forcibly closed by the remote host.
	这是因为一个正在使用的 tcp 连接被服务端强行关闭了。
*/
func BadServer() {
	engine := gin.Default()
	util.Router(engine)
	srv := &http.Server{Handler: engine, Addr: ":9991"}
	err := srv.ListenAndServe()
	_ = srv.Shutdown(context.Background())
	if err != nil {
		fmt.Println(err)
	}
}

/*
	客户端不会报错
*/
func GraceServer() {
	engine := gin.Default()
	util.Router(engine)
	srv := &http.Server{Handler: engine, Addr: ":9991"}
	channelMark := make(chan os.Signal, 2)
	go func() {
		ch := make(chan os.Signal)
		// k8s 会发 TERM 信号
		signal.Notify(ch, syscall.SIGTERM, syscall.SIGQUIT, syscall.SIGKILL, syscall.SIGINT)
		tmp := <-ch
		channelMark <- tmp
		_ = srv.Shutdown(context.Background())
	}()

	err := srv.ListenAndServe()
	if err != nil {
		fmt.Println("ListenAndServe->", err)
	}
	err = srv.Shutdown(context.Background())
	if err != nil {
		fmt.Println("Shutdown->", err)
	}
	tmp := <-channelMark
	fmt.Println("Notify->", tmp, fmt.Sprintf("%d", tmp))
}

k8s 如何优雅更新

简单说就是加两个探针并设置 preStop， liveness 用于检测已注册的节点是否可用，readiness 用于检测新创建的节点是否可用。示例如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grace-shut-example-deploy
spec:
  replicas: 3
  template:
    spec:
      imagePullSecrets:
        - name: all-aliyuncs
      containers:
        - name: grace-shut-example
          image: registry.cn-zhangjiakou.aliyuncs.com/xiaoduoai/ecrobot-grace_shut_example:v0.0.4
          imagePullPolicy: IfNotPresent
          command: ["./grace_shut_example", ">>", "/var/log/xiaoduo/grace_shut_example.out"]
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          volumeMounts:
            - mountPath: /var/log/xiaoduo
              name: log-volume
          readinessProbe:
            tcpSocket:
              port: 9991
            initialDelaySeconds: 5
            periodSeconds: 1
            successThreshold: 1
            failureThreshold: 2
          livenessProbe:
            tcpSocket:
              port: 9991
            initialDelaySeconds: 5
            periodSeconds: 1
            failureThreshold: 2
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 10"]
      volumes:
        - name: log-volume
          persistentVolumeClaim:
            claimName: log-volume-claim

以上示例使用 tcp 探测的方式，并设置探测时间间隔为 1 秒。

核心配置探针

主要是设置存活探针和就绪探针。

核心配置 preStop

为什么要配置 preStop
- 因为通知 load balance 将老节点摘除和向老节点发送 TERM 信号，是并发的（实际上是先通知load balance 摘掉老节点，然后再异步的发送 TERM 信号），没有保证先后顺序，这就意味着，可能 load balance 还没有将老借点摘除，但是老节点已经接受到了 TERM 信号并且关闭了TCP连接请求，这样部分请求就到了一个已关闭的节点上。
- preStop 是在通知 load balance 摘掉老节点之后的一个操作，这是一个阻塞操作，他会执行指令，直到指令接受，所以，我们在preStop中调用 sleep 函数，给load balance 足够的时间来摘除老节点。然后再‘异步’地发送 TERM 信号。

Kubernetes 在容器结束前立即发送 preStop 事件。除非 Pod 宽限期限超时，Kubernetes 的容器管理逻辑会一直阻塞等待 preStop 处理函数执行完毕。

This deployment configuration will perform version updates in the following way: It will create one pod with the new version at a time, wait for the pod to start-up and become ready, trigger the termination of one of the old pods, and continue with the next new pod until all replicas have been transitioned. In order to tell Kubernetes when our pods are running and ready to handle traffic we need to configure liveness and readiness probes.

此部署配置将以以下方式执行版本更新：它将一次创建一个具有新版本的Pod，等待Pod启动并准备就绪，触发其中一个旧Pod的终止，然后继续下一个新的Pod，直到所有副本都已转换。为了告诉Kubernetes我们的Pod何时运行并准备处理流量，我们需要配置活动和就绪探针。

If our client, that is the zero-downtime test, connects to the coffee-shop service directly from inside the cluster, it typically uses the service VIP resolved via Cluster DNS and ends up at a Pod instance. This is realized via the kube-proxy that runs on every Kubernetes node and updates iptables that route to the IP addresses of the pods.

如果我们的客户端（即零停机时间测试）直接从群集内部连接到服务，则它通常使用通过群集DNS解析的服务VIP，最终到达Pod实例。这是通过在每个Kubernetes节点上运行并更新路由到Pod的IP地址的iptables的kube-proxy来实现的。

实现

目前是在公司 gitlab 个人目录下实现了一个 k8s 优雅退出的模板，经过测试。在 qps 为 1000（更高的qps还没有测试）可以实现零报错的更新镜像。

gitlab地址：https://gitlab.xiaoduoai.com/zhuyuanbing/grace_shut_example

使用

部署

1
2
3

# 项目：https://gitlab.xiaoduoai.com/devops/k8s-app-deploy

# 目录：/ks-prod/ecrobot/grace-shut-example/grace

模拟更新镜像

# 如在测试环境
# 以下两个指令模拟线上更新镜像
kubectl set image deployment/grace-shut-example-deploy grace-shut-example=registry.cn-zhangjiakou.aliyuncs.com/xiaoduoai/ecrobot-grace_shut_example:v0.0.7 --namespace=test-ks

kubectl set image deployment/grace-shut-example-deploy grace-shut-example=registry.cn-zhangjiakou.aliyuncs.com/xiaoduoai/ecrobot-grace_shut_example:v0.0.4 --namespace=test-ks