golang--性能测试和分析

时间 2019-11-08

标签 golang 性能测试分析栏目 Go 繁體版

原文原文链接

前言

测试分为：压力测试、负载测试、性能测试，功能测试等等，其中在开发过程当中开发人员常常要写一些test case unit 本身的模块进行功能测试测和性能。在分析出模块的性能瓶颈后开发人员就须要针对性的调优，但须要提醒的是调优工程通常要放在最后在进行，过早地优化会浪费开发时间，并且有时在需求或者功能变更后就会变成无用功，这是显而易见的。
随着移动互联网的兴起，各类远端服务的性能遇到巨大的挑战，虽然硬件资源日益丰富，但成本的增长和用户体验的严苛，性能调优在软件开发中占的权重也愈来愈高，尤为是一些常常被调用的模块，如核心算法的部分，即便把每次执行的cpu运行的时间下降1%，把内存消耗下降1%，在面对海量的用户请求时对性能也会带来提高。
Go语言做为一个“新兴语言”，它自身为咱们提供了一套用于性能测试和分析的工具和接口，做者这里结合一个具体实例来说解一下简单的调优过程。html

经常使用的测试命令

启动指定的测试函数node
```
go test -run=xxx $packetpath
```
编译一个测试binary文件git
```
go test -c -o $binaryname
```
启动一个bench测试github
```
go test -bench=xxx -run=None
```
在 -bench 的选项上设置一个None表明着不运行功能测试函数即Test开头的函数，-bench后面接的是一个匹配项，任何符合该匹配项的Bench测试函数都会被触发执行。golang
```
go test -bench=xxx -run=None -benchmem
```
添加 -benchmem 会显示出测试过程当中分配mem的数量。web
启动profile分析算法

在go 自带的工具中支持三种profile收集： cpu、block、memory，cpuprofile 是用来记录各个事件或函数对cpu的消耗状况的数据；blockprofile是为了记录携程阻塞的操做，能够用来分析每一个阻塞操做占用的时间；memprofile是用于堆性能剖析，记录携程内部堆分配的具体状况。windows
```
go test -bench=xxx -run=None -cpuprofile=cpu.out
    go test -bench=xxx -run=None -blockprofile=block.out
    go test -bench=xxx -run=None -memprofile=mem.out
```
不管是cpu分析、阻塞分析、堆分析，主要目的都是为了消耗更少的计算资源、内存资源来取得更高的效率，但有时候时间和空间并不能兼得，用空间换时间是很经常使用的一种作法，须要根据具体状况进行权衡。浏览器
枚举包内的文件并发
```
# 枚举一个包中的.go 文件
    go list -f={{.GoFiles}}

    # 枚举一个包中的_test.go文件 
    go list -f={{.TestGoFiles}} 

    # 枚举一个包中的外部测试文件
    go list -f={{.XTestGoFiles}}
```
外部测试文件：在一些特殊状况下，外部测试包须要对待测试包有特殊的访问权限，例如为了不循环引用，一个白盒测试必须存在一个单独的包中，在这种状况下咱们使用了一种小技巧：在包内测试未见中添加一些函数声明，将包内部的功能暴露给外部测试，这种作法为为包测试留了一个“后门”。若是一个源文件的存在的惟一目的就在于此，而且本身不包含任何测试，它们通常称做export_test.go
测试覆盖率
在测试的时候有一种最简单的统计方法--语句覆盖率，部分语句在一次测试过程当中至少执行了一次，则称之为覆盖，不然为非覆盖。
```
# 执行测试并生成coverprofile 文件
    go test -run=TransferProofGenerator -cover -coverprofile=c.out

    # 查看覆盖率，经过go tool 工具
    go tool cover -html=c.out // 会在浏览器中打开一个测试报告
```
若是在生成profile的命令中加入了 -covermode=count 的标记，则能够标记出执行效率较高和较低的模块。

性能剖析流程

这里做者会结合一个示例来讲明一下性能剖析的具体操做流程。性能调优首要的任务就是找出消耗时间最多的函数。

收集程序的profile

通常收集有两种方法，一种是经过工具命令来收集，上面已经提到了；另外一种是经过调用相关模块来收集，这里面以cpuprofile为例：
```
import( 
    "runtime/pprof"
    "testing"
    "os"
)
func BenchmarkXXX(b *testing.B){
    fd, _ := os.OpenFile("cpu.log", os.O_CREATE| os.O_RDWR, os.ModeSetuid)

    if err := pprof.StartCPUProfile(fd); err != nil{
        b.Fatalf("TestTransferProofValidator %s", err.Error())
    }

    defer func() {
        pprof.StopCPUProfile()
        fd.Close()
    }()
}
```
这样在执行后自动就会生成cpuprofile文件，这里是将函数放在里Bench测试函数中，也能够将它放在正常函数中好比main函数。有的时候须要实时的监听web服务的运行状态，这个时候就能够经过引入net/http/pprof来收集profile：
```
package main

import (
    "net/http"
    "github.com/utxo/webserver/router"
    _ "net/http/pprof"
)

func main()  {
    handler:= http.FileServer(http.Dir("./template"));
    http.Handle("/", handler)
    http.HandleFunc("/transfer", router.Transfer);
    http.ListenAndServe("0.0.0.0:8082", nil);
}
```
此时会默认开启proof收集服务，go tool做为客户端来访问该router来收集profile:
```
go tool pprof  http://localhost:8082/debug/pprof/profile
```
在收集结束后会进入cmd命令界面（默认进行 30s 的 CPU Profiling收集），经过命令行来查看、处理收集到的数据。
查看profile数据
```
go tool pprof -text -nodecount=10  cpu.log
```
-nodecount 函数会为咱们列出cpu时间消耗前十的函数。也能够进入cmd模式，进行更详细的剖析：
```
go tool pprof list  cpu.log
```

详细示例

做者这里展现的是一个零知识证实的生成和校验的算法的性能分析过程，按照上节所说咱们先收集profile，使用的方法是调用pprof模块的方法：

func BenchmarkTransferProofValidator(b *testing.B) {
    once.Do(testInit)
    fd, _ := os.OpenFile("cpu_validator.log", os.O_CREATE| os.O_RDWR, os.ModeSetuid)


    inputs, T := generateInputs(100, big.NewInt(1000000))
    proof, output, B, err := TransferProofGenerator(inputs, T, pairsA, pairsB,
        zkpKeyPairsA, zkpKeyPairsB, zkpRange)
    if err != nil{
        b.Fatalf("TestTransferProofValidator: %s", err.Error())
    }

    if err := pprof.StartCPUProfile(fd); err != nil{
        b.Fatalf("TestTransferProofValidator %s", err.Error())
    }
    defer func() {
        pprof.StopCPUProfile()
        fd.Close()
    }()

    for i := 0; i < b.N; i++ {
        err = TransferProofValidator(inputs, output, B, pairsA, pairsB,
            zkpKeyPairsA, zkpKeyPairsB, proof)
        if err != nil{
            b.Fatalf("TestTransferProofValidator: %s", err.Error())
        }
    }
}

执行以下命令：

go test -c -bench=BenchmarkTransferProofValidator -run=None

显示结果以下：

D:\gopath\src\github.com\utxo\balance>go test -run=None -bench=Va
 goos: windows
 goarch: amd64
 pkg: github.com/utxo/balance
 BenchmarkTransferProofValidator-4              3         397296900 ns/op
 PASS
 ok      github.com/utxo/balance 3.237s

TransferProofValidator运行了三次，gomaxprocs = 4，这些都是go tool自动分配，暂时不用深究。能够看到咱们的函数运行十分的缓慢，执行一次须要 0.4s 左右，它将做为一个高频调用的模块远远达不到使用的需求。

接下来咱们要找到耗时最多的那个部分：

go tool pprof list  cpu_validator.log

显示结果以下：

D:\gopath\src\github.com\utxo\balance> go tool pprof list  cpu_validator.log
 list: open list: The system cannot find the file specified.
 Fetched 1 source profiles out of 2
 Main binary filename not available.
 Type: cpu
 Time: Dec 20, 2018 at 11:15pm (CST)
 Duration: 625.81ms, Total samples = 420ms (67.11%)
 Entering interactive mode (type "help" for commands, "o" for options)
 (pprof) top
 Showing nodes accounting for 420ms, 100% of 420ms total
 Showing top 10 nodes out of 25
     flat  flat%   sum%        cum   cum%
     240ms 57.14% 57.14%      240ms 57.14%  math/big.addMulVVW
     110ms 26.19% 83.33%      380ms 90.48%  math/big.nat.montgomery
     30ms  7.14% 90.48%       30ms  7.14%  runtime.memmove
     10ms  2.38% 92.86%       10ms  2.38%  math/big.(*Int).QuoRem
     10ms  2.38% 95.24%       10ms  2.38%  math/big.nat.add
     10ms  2.38% 97.62%       10ms  2.38%  runtime.lock
     10ms  2.38%   100%       20ms  4.76%  runtime.newstack
         0     0%   100%      400ms 95.24%  github.com/utxo/balance.BenchmarkTransferProofValidator
         0     0%   100%      400ms 95.24%  github.com/utxo/balance.TransferProofValidator
         0     0%   100%      400ms 95.24%  github.com/utxo/crypto/base.Exp

每一行表示一个函数的信息。前两列表示函数在 CPU 上运行的时间以及百分比；第三列是当前全部函数累加使用 CPU 的比例；第四列和第五列表明这个函数以及子函数运行所占用的时间和比例（也被称为累加值 cumulative），应该大于等于前两列的值；最后一列就是函数的名字。若是应用程序有性能问题，上面这些信息应该能告诉咱们时间都花费在哪些函数的执行上了。

以上面显示的结果为例，根据cum来观察大部分的时间都花费在了Exp的执行过程当中（base.Exp 函数在其余模块内被调用），咱们能够经过list命令来仔细分析一下Exp函数中在那部分比较耗时：

Total: 420ms
 ROUTINE ======================== github.com/utxo/crypto/base.Exp in D:\gopath\src\github.com\utxo\crypto\base\base.go
         0      400ms (flat, cum) 95.24% of Total
         .          .     91:           return E
         .          .     92:   }
         .          .     93:
         .          .     94:   if x.Sign() < 1{
         .          .     95:           x = (&big.Int{}).Neg(x)
         .       30ms     96:           E = (&big.Int{}).Exp(g, x, n)
         .          .     97:           if 1 != Gcd(E, n).Int64(){
         .          .     98:                   log.Printf("h_r is not prime with n")
         .          .     99:                   return nil
         .          .    100:           }
         .          .    101:
         .       20ms    102:           E = (&big.Int{}).ModInverse(E, n)
         .          .    103:           E = (&big.Int{}).Mod(E, n)
         .          .    104:
         .          .    105:   }else {
         .      350ms    106:           E = (&big.Int{}).Exp(g, x, n)
         .          .    107:   }
         .          .    108:
         .          .    109:   return E
         .          .    110:}
         .          .    111:

如上所示，base.Exp 总计占用了 95.24% 的cpu时间，而整个函数内最耗时的就是 big.Exp() 函数。这就给做者制造了个难题，由于在本模块中涉及到大量的指数运算，并且输入数据都较大（平均 512bit）。

由于本模块的功能相对比较简单，因此比较容易的定位到了问题的所在，接下来咱们要想办法进行调优。
由于本模块目前是纯粹的算法模块，能够考虑进行并发改造，来发挥go协程的特色，尤为是在调用base.Exp的部分中。但做者将尝试并发改造后发现改造的结果并不理想，由于使用channel进行同步致使阻塞，‘抵消’了多协程带来的性能提高。

另一个思路就是尽可能避免big.Exp指数运算，即将指数型运算转换为其余时间复杂度较低的运算，多协程是利用go自己的特性和cpu的多核运算。而这一种方法则纯粹是从数学角度上进行优化，并不具备普适性。改造后再进行测试显示结果以下：
```
D:\gopath\src\github.com\utxo\balance>go test -run=None -bench=Va
 goos: windows
 goarch: amd64
 pkg: github.com/utxo/balance
 BenchmarkTransferProofValidator-4            200           5096903 ns/op
 PASS
 ok      github.com/utxo/balance 3.753s
```
虽然在零知识证实校验过程当中多协程并未带来显著的效果提高，但在零知识证实生成的过程当中起到了很大的做用。

golang--性能测试和分析

前言

经常使用的测试命令

性能剖析流程

详细示例

优化结果

参考网址