Golang 系统监控 sysmon

[TOC]

设计原理

监控循环

启动 sysmon

当 Go 语言程序启动时，运行时会在第一个 Goroutine 中调用 runtime.main 启动主程序，该函数会在系统栈中创建新的线程：

func main() {
	...
	if GOARCH != "wasm" {
		systemstack(func() {
			newm(sysmon, nil)
		})
	}
	...
}

runtime.newm -> runtime.newm1 -> runtime.newsproc -> runtime.sysmon

在新创建的线程中，我们会执行存储在 runtime.m 中的 runtime.sysmon 启动系统监控.
运行时执行系统监控不需要处理器，系统监控的 Goroutine 会直接在创建的线程上运行.

sysmon 会在循环中完成以下的工作：

检查死锁
运行计时器 — 获取下一个需要被触发的计时器；
轮询网络 — 获取需要处理的到期文件描述符；
抢占处理器 — 抢占运行时间较长的或者处于系统调用的 Goroutine；
垃圾回收 — 在满足条件时触发垃圾收集回收内存；

sysmon 循环休眠时间

系统监控在每次循环开始时都会通过 usleep 挂起当前线程，该函数的参数是微秒，运行时会遵循以下的规则决定休眠时间：

初始的休眠时间是 20μs；
最长的休眠时间是 10ms；
当系统监控在 50 个循环中都没有唤醒 Goroutine 时，休眠时间在每个循环都会倍增；

检查死锁

func checkdead() {
	...
	for _, _p_ := range allp {
		if len(_p_.timers) > 0 {
			return
		}
	}

	throw("all goroutines are asleep - deadlock!")
}

省略

运行计时器

省略

轮询网络

如果上一次轮询网络已经过去了 10ms，那么系统监控还会在循环中轮询网络，检查是否有待执行的文件描述符：

func sysmon() {
    	lastpoll := int64(atomic.Load64(&sched.lastpoll))
		if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
			atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))
            // 非阻塞地调用 runtime.netpoll 检查待执行的文件描述符并通过 runtime.injectglist 将所有处于就绪状态的 Goroutine 加入全局运行队列中：
			list := netpoll(0) // non-blocking - returns list of goroutines
			if !list.empty() {
				// Need to decrement number of idle locked M's
				// (pretending that one more is running) before injectglist.
				// Otherwise it can lead to the following situation:
				// injectglist grabs all P's but before it starts M's to run the P's,
				// another M returns from syscall, finishes running its G,
				// observes that there is no work to do and no other running M's
				// and reports deadlock.
				incidlelocked(-1)
				injectglist(&list)
				incidlelocked(1)
			}
		}
}

该函数会将所有 Goroutine 的状态从 _Gwaiting 切换至 _Grunnable 并加入全局运行队列等待运行，如果当前程序中存在空闲的处理器，会通过 runtime.startm 启动线程来执行这些任务。

抢占处理器

系统监控会在循环中调用 runtime.retake 抢占处于运行或者系统调用中的处理器，该函数会遍历运行时的全局处理器，每个处理器都存储了一个 runtime.sysmontick

当处理器处于 _Prunning 或者 _Psyscall 状态时，如果上一次触发调度的时间已经过去了 10ms，我们会通过 runtime.preemptone 抢占当前处理器；
当处理器处于 _Psyscall 状态时，在满足以下两种情况下会调用 runtime.handoffp 让出处理器的使用权：
- 当处理器的运行队列不为空或者不存在空闲处理器时2；
- 当系统调用时间超过了 10ms 时3；

系统监控通过在循环中抢占处理器来避免同一个 Goroutine 占用线程太长时间造成饥饿问题。

垃圾回收

在最后，系统监控还会决定是否需要触发强制垃圾回收，runtime.sysmon 会构建 runtime.gcTrigger 并调用 runtime.gcTrigger.test 方法判断是否需要触发垃圾回收：

如果需要触发垃圾回收，我们会将用于垃圾回收的 Goroutine 加入全局队列，让调度器选择合适的处理器去执行。

小结

运行时通过系统监控来触发线程的抢占、网络的轮询和垃圾回收，保证 Go 语言运行时的可用性。系统监控能够很好地解决尾延迟的问题，减少调度器调度 Goroutine 的饥饿问题并保证计时器在尽可能准确的时间触发。

参考

原文：系统监控