Golang的性能测试

本文介绍Golang的性能测试(Benchmark)。html

使用testing包

看下面的bench_test.go:node

import "testing"

func Fib(n int) int {
	switch n {
	case 0:
		return 0
	case 1:
		return 1
	default:
		return Fib(n-1) + Fib(n-2)
	}
}

func BenchmarkFib20(b *testing.B) {
	for n := 0; n < b.N; n++ {
		Fib(20) // run the Fib function b.N times
	}
}
复制代码

运行程序git

> $ go test -bench=. -run=^BenchmarkFib20$
goos: darwin
goarch: amd64
pkg: github.com/liangyaopei/GolangTester/bench
BenchmarkFib20-8           25608             46494 ns/op
PASS
ok      github.com/liangyaopei/GolangTester/bench       1.678s
复制代码

从输出能够看到, BenchmarkFib20-8中的-8后缀，指的是GOMAXPROCS，它与CPU的数量有关。能够经过-cpu标志来指定github

> $ go test -bench=. -cpu=1,2,4 -run=^BenchmarkFib20$
复制代码

25608表示进行了25608次循环，每次循环耗时46494ns (46494 ns/op)。web

其余标志浏览器

	做用	例子
-benchtime	指定运行时间	`go test -bench=. -benchtime=10s -run=^BenchmarkFib20$`
-count	指定运行次数	`go test -bench=. -count=10 -run=^BenchmarkFib20$`
-benchmem	监控内存分配	`go test -bench=. -benchmem -run=^BenchmarkFib20$`

将test保存为二进制文件

go test -c将测试的代码保存为二进制文件，方便下次调用。markdown

CPU/内存/Block的Profile

-cpuprofile=$FILE 将 CPU profile 输出 $FILE.
-memprofile=$FILE, 将内存 profile 输出 $FILE, -memprofilerate=N adjusts the profile rate to 1/N.
-blockprofile=$FILE, 将 block profile 输出to $FILE.

例子：并发

> $ go test -bench=.  -run=^BenchmarkFib20$ -cpuprofile=profile
> $ go tool pprof profile
复制代码

使用PProf

profile用来跟踪整个程序的运行，用来定位程序性能的问题。函数

pprof来自Google Perf Tools, 它被整合进Golang的runtime。pprof包含2个部分，oop

runtime/pprof 每一个Golang程序都有用到
go tool pprof 用来读取profile输出文件

pprof的类型

CPU Profile

程序运行时(runtime)， CPU Profile每隔10ms会打断(interrupt)程序执行, 记录当前运行的Goroutine的堆栈踪影(stack trace)。
Memory profiling

当堆内存分配时，Memory profiling记录stack trace.

与CPU Profile相似，默认状况下，每1000次堆内存分配，Memory profiling会进行1次取样。

Memory profiling是取样的，而且它跟踪的时没有使用的内存分配，所以它不能肯定整个程序运行使用的内存。
Block (or blocking) profiling

Block profiling记录一个Goroutine等待共享资源的时间，这能够用来肯定程序中的并发瓶颈。
Mutex contention profiling

Mutex contention profiling记录由于互斥致使延迟的操做。

每次一个Profile

使用profile是有性能消耗的，使用时每次只用1种类型的Profile。若是同时使用多个Profile，Profile之间会相互影响。

使用PProf

看下面的例子words.go

import (
	"fmt"
	"io"
	"log"
	"os"
	"unicode"

	"github.com/pkg/profile"
)

func readbyte(r io.Reader) (rune, error) {
	var buf [1]byte
	_, err := r.Read(buf[:])
	return rune(buf[0]), err
}

func main() {
	defer profile.Start().Stop()

	f, err := os.Open(os.Args[1])
	if err != nil {
		log.Fatalf("could not open file %q: %v", os.Args[1], err)
	}

	words := 0
	inword := false
	for {
		r, err := readbyte(f)
		if err == io.EOF {
			break
		}

		if err != nil {
			log.Fatalf("could not read file %q: %v", os.Args[1], err)
		}

		if unicode.IsSpace(r) && inword {
			words++
			inword = false
		}
		inword = unicode.IsLetter(r)
	}
	fmt.Printf("%q: %d words\n", os.Args[1], words)
}
复制代码

进行性能测试：

> $ go run words.go whales.txt
2021/02/06 14:38:20 profile: cpu profiling enabled, /var/folders/q9/5v5tz4_92gd343hvb3mb1_s40000gn/T/profile247619673/cpu.pprof
"whales.txt": 181276 words
2021/02/06 14:38:21 profile: cpu profiling disabled, /var/folders/q9/5v5tz4_92gd343hvb3mb1_s40000gn/T/profile247619673/cpu.pprof

复制代码

使用命令go tool pprof读取文件

> $ go tool pprof /var/folders/q9/5v5tz4_92gd343hvb3mb1_s40000gn/T/profile247619673/cpu.pprof
Type: cpu
Time: Feb 6, 2021 at 2:38pm (CST)
Duration: 1.63s, Total samples = 1.29s (79.06%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1.29s, 100% of 1.29s total
      flat  flat%   sum%        cum   cum%
     1.29s   100%   100%      1.29s   100%  syscall.syscall
         0     0%   100%      1.29s   100%  internal/poll.(*FD).Read
         0     0%   100%      1.29s   100%  internal/poll.ignoringEINTR
         0     0%   100%      1.29s   100%  main.main
         0     0%   100%      1.29s   100%  main.readbyte (inline)
         0     0%   100%      1.29s   100%  os.(*File).Read
         0     0%   100%      1.29s   100%  os.(*File).read (inline)
         0     0%   100%      1.29s   100%  runtime.main
         0     0%   100%      1.29s   100%  syscall.Read
         0     0%   100%      1.29s   100%  syscall.read

复制代码

能够看到syscall.syscall耗费最多的CPU资源, 由于每次readbyte(f)都会发生系统调用，读取字符。

(pprof) list main.main
Total: 1.29s
ROUTINE ======================== main.main in /Users/liangyaopei/workspace/GolangTester/profile/words.go
         0      1.29s (flat, cum)   100% of Total
         .          .     25:   }
         .          .     26:
         .          .     27:   words := 0
         .          .     28:   inword := false
         .          .     29:   for {
         .      1.29s     30:           r, err := readbyte(f)
         .          .     31:           if err == io.EOF {
         .          .     32:                   break
         .          .     33:           }
         .          .     34:
         .          .     35:           if err != nil {

复制代码

go tool pprof经常使用命令

	说明
top	Top 命令会按指标大小列出前 10 个函数，好比 CPU 是按执行时间长短，内存是按内存占用多少。
traces	打印全部调用栈和调用栈的指标信息。
list	list 命令则是用来帮咱们确认函数在代码中的位置。
web	打开浏览器图形界面
go tool pprof -http=:8080	使用浏览器形式读取Profile输出

编译器优化

编译器优化主要包括3个方面

逃逸分析
函数内联(inline)
死代码(dead code)清除

go build 命令

	做用	例子
-m	打印编译器的逃逸分析决定	`go build -gcflags=-m example.go`
-l	是否内联	`-gcflags=-l`禁止内联; `-gcflags='-l -l'` 两层内联

参考文献

High Performance Go Workshop

个人公众号：lyp分享的地方

个人知乎专栏: zhuanlan.zhihu.com/c_127546654…

个人博客：www.liangyaopei.com

Github Page: liangyaopei.github.io/