Google 的 Git v2 带来颠覆性性能提高?恐怕未必。

做者简介

王振威,CODING 创始团队成员之一,多年系统软件开发经验,擅长 Linux,Golang,Java,Ruby,Docker 等技术领域,近两年来一直在 CODING 从事系统架构和运维工做html

前言

最近 Google 发布了一篇文章,描述了对 Git 的一个传输协议的更新,引发了国内技术圈的不小规模的轰动(相关文章请自行百度“Git v2 性能提高”)。
不少技术圈的朋友也在转载这个新闻,那至于性能改进有多大,里面的细节是什么呢?事实上此次改动只在极端状况下有性能提高,绝大多数状况下,用户感觉不到性能的提高。不少不明因此的转发大概是由于 Google 的品牌效应吧 :)git

Git 是什么?

为了讲清楚 why,咱们先来简单介绍一下 Git 相关的协议。若是你还不了解 Git,想了解更多内容,可参考其官方网站:http://git-scm.com/ . 也可来 https://coding.net/help/doc/git 这里了解如何在国内使用优质快速的 Git 托管服务。express

Git 传输协议

Git 常见的有三种协议,SSH,HTTP(S),Git,使用最普遍的是前两种。服务器

让咱们来看一下, HTTP(S) 和 SSH 协议的使用示例网络

git clone https://git.coding.net/wzw/coding-demo.git
Cloning into 'coding-demo'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
git clone git@git.coding.net:wzw/coding-demo.git
Cloning into 'coding-demo'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.

能够看到,对于全新 clone 来说二者基本上的过程是如出一辙的。架构

事实上, Git 底层对于各类应用层协议的底层处理是一致的,不论是 HTTP(S) 仍是 SSH 仍是 Git 协议。less

让咱们来进一步看一下, Git 在传输过程当中都作了什么。运维

GIT_TRACE=1 GIT_TRACE_PACKET=1 git clone https://git.coding.net/wzw/coding-demo.git
17:48:21.767799 git.c:344               trace: built-in: git 'clone' 'https://git.coding.net/wzw/coding-demo.git'
Cloning into 'coding-demo'...
17:48:21.797959 run-command.c:626       trace: run_command: 'git-remote-https' 'origin' 'https://git.coding.net/wzw/coding-demo.git'
17:48:22.278880 pkt-line.c:80           packet:          git< # service=git-upload-pack
17:48:22.279390 pkt-line.c:80           packet:          git< 0000
17:48:22.279405 pkt-line.c:80           packet:          git< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/master agent=git/2.15.0
17:48:22.279419 pkt-line.c:80           packet:          git< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.279431 pkt-line.c:80           packet:          git< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.279442 pkt-line.c:80           packet:          git< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.279453 pkt-line.c:80           packet:          git< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.279472 pkt-line.c:80           packet:          git< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
17:48:22.279483 pkt-line.c:80           packet:          git< 0000
17:48:22.280959 pkt-line.c:80           packet:          git> fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.280986 pkt-line.c:80           packet:          git> fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.280999 pkt-line.c:80           packet:          git> 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.281011 pkt-line.c:80           packet:          git> 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.281023 pkt-line.c:80           packet:          git> 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.281033 pkt-line.c:80           packet:          git> 0000
17:48:22.281089 run-command.c:626       trace: run_command: 'fetch-pack' '--stateless-rpc' '--stdin' '--lock-pack' '--thin' '--check-self-contained-and-connected' '--cloning' 'https://git.coding.net/wzw/coding-demo.git/'
17:48:22.287860 git.c:344               trace: built-in: git 'fetch-pack' '--stateless-rpc' '--stdin' '--lock-pack' '--thin' '--check-self-contained-and-connected' '--cloning' 'https://git.coding.net/wzw/coding-demo.git/'
17:48:22.288761 pkt-line.c:80           packet:   fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.288799 pkt-line.c:80           packet:   fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.288824 pkt-line.c:80           packet:   fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.288838 pkt-line.c:80           packet:   fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.288851 pkt-line.c:80           packet:   fetch-pack< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.288863 pkt-line.c:80           packet:   fetch-pack< 0000
17:48:22.288876 pkt-line.c:80           packet:   fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/master agent=git/2.15.0
17:48:22.288901 pkt-line.c:80           packet:   fetch-pack< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:48:22.288914 pkt-line.c:80           packet:   fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:48:22.288927 pkt-line.c:80           packet:   fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:48:22.288941 pkt-line.c:80           packet:   fetch-pack< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:48:22.288955 pkt-line.c:80           packet:   fetch-pack< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
17:48:22.288967 pkt-line.c:80           packet:   fetch-pack< 0000
17:48:22.289909 pkt-line.c:80           packet:   fetch-pack> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed no-done side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
17:48:22.289924 pkt-line.c:80           packet:   fetch-pack> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:48:22.290081 pkt-line.c:80           packet:   fetch-pack> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:48:22.290094 pkt-line.c:80           packet:   fetch-pack> want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e
17:48:22.290103 pkt-line.c:80           packet:   fetch-pack> 0000
17:48:22.290127 pkt-line.c:80           packet:   fetch-pack> done
17:48:22.290257 pkt-line.c:80           packet:   fetch-pack> 0000
17:48:22.290290 pkt-line.c:80           packet:          git< 00a8want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed no-done side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)0032want 1536ad10fc0a188c50680932ca191c8da46938c40032want 1536ad10fc0a188c50680932ca191c8da46938c40032want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e00000009done
17:48:22.290375 pkt-line.c:80           packet:          git< 0000
17:48:22.436811 pkt-line.c:80           packet:   fetch-pack< NAK
17:48:22.436844 pkt-line.c:80           packet:   fetch-pack> 0000
17:48:22.437152 pkt-line.c:80           packet:     sideband< \2Counting objects: 7, done.
remote: Counting objects: 7, done.
17:48:22.437185 pkt-line.c:80           packet:     sideband< \2Compressing objects:  25% (1/4)   \15
17:48:22.437200 pkt-line.c:80           packet:     sideband< \2Compressing objects:  50% (2/4)   \15
17:48:22.437250 pkt-line.c:80           packet:     sideband< \2Compressing objects:  75% (3/4)   \15
17:48:22.437279 pkt-line.c:80           packet:     sideband< \2Compressing objects: 100% (4/4)   \15
17:48:22.437302 pkt-line.c:80           packet:     sideband< \2Compressing objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
17:48:22.447214 pkt-line.c:80           packet:          git< 0000
17:48:22.447201 pkt-line.c:80           packet:     sideband< PACK ...
17:48:22.447316 pkt-line.c:80           packet:     sideband< \2Total 7 (delta 0), reused 0 (delta 0)
remote: Total 7 (delta 0), reused 0 (delta 0)
17:48:22.447363 pkt-line.c:80           packet:     sideband< 0000
17:48:22.447372 run-command.c:626       trace: run_command: 'unpack-objects' '--pack_header=2,7'
17:48:22.453090 git.c:344               trace: built-in: git 'unpack-objects' '--pack_header=2,7'
Unpacking objects: 100% (7/7), done.
17:48:22.460604 run-command.c:626       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'
17:48:22.464831 git.c:344               trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'
GIT_TRACE=1 GIT_TRACE_PACKET=1 git clone git@git.coding.net:wzw/coding-demo.git
17:49:18.654786 git.c:344               trace: built-in: git 'clone' 'git@git.coding.net:wzw/coding-demo.git'
Cloning into 'coding-demo'...
17:49:18.669187 run-command.c:626       trace: run_command: 'ssh' 'git@git.coding.net' 'git-upload-pack '\''wzw/coding-demo.git'\'''
17:49:19.768942 pkt-line.c:80           packet:        clone< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/master agent=git/2.15.0
17:49:19.772436 pkt-line.c:80           packet:        clone< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:49:19.772527 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:49:19.772549 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:49:19.772566 pkt-line.c:80           packet:        clone< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:49:19.772863 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
17:49:19.772910 pkt-line.c:80           packet:        clone< 0000
17:49:19.776185 pkt-line.c:80           packet:        clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
17:49:19.776215 pkt-line.c:80           packet:        clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1
17:49:19.776224 pkt-line.c:80           packet:        clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776232 pkt-line.c:80           packet:        clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776239 pkt-line.c:80           packet:        clone> want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e
17:49:19.776246 pkt-line.c:80           packet:        clone> 0000
17:49:19.776262 pkt-line.c:80           packet:        clone> done
17:49:19.879841 pkt-line.c:80           packet:        clone< NAK
17:49:19.880083 run-command.c:626       trace: run_command: 'index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 75332 on wangzheweideMBP' '--check-self-contained-and-connected'
17:49:19.885280 git.c:344               trace: built-in: git 'index-pack' '--stdin' '-v' '--fix-thin' '--keep=fetch-pack 75332 on wangzheweideMBP' '--check-self-contained-and-connected'
17:49:19.889021 pkt-line.c:80           packet:     sideband< \2Counting objects: 7, done.
remote: Counting objects: 7, done.
17:49:19.895119 pkt-line.c:80           packet:     sideband< \2Compressing objects:  25% (1/4)   \15Compressing objects:  50% (2/4)   \15Compressing objects:  75% (3/4)   \15Compressing objects: 10
17:49:19.895170 pkt-line.c:80           packet:     sideband< \20% (4/4)   \15
17:49:19.897621 pkt-line.c:80           packet:     sideband< \2Compressing objects: 100% (4/4), done.
remote: Compressing objects: 100% (4/4), done.
17:49:19.914866 pkt-line.c:80           packet:     sideband< PACK ...
17:49:19.914916 pkt-line.c:80           packet:     sideband< \2Total 7 (delta 0), reused 0 (delta 0)
remote: Total 7 (delta 0), reused 0 (delta 0)
17:49:19.914936 pkt-line.c:80           packet:     sideband< 0000
Receiving objects: 100% (7/7), done.
17:49:20.088640 run-command.c:626       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'
17:49:20.093965 git.c:344               trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' '--progress=Checking connectivity'

我使用了 GIT_TRACE=1 GIT_TRACE_PACKET=1 环境变量来让 Git 打印出 clone 过程当中的更多信息,方便调试。并且咱们发现,HTTPS 和 SSH 协议,Git 底层调用了不一样的命令,可是内容的交互过程倒是极为类似。dom

简而言之,整个 Clone 交互的协议过程大体以下:ssh

  • 客户端向远端声明本身要进行的操做 -- git-upload-pack (全部读取性质的操做都是这个)
  • 服务端返回本身兼容的协议格式以及推荐的 ref 列表
  • 客户端声明本身想要接收的对象列表
  • 服务器端计算须要传输的全部对象并压缩并且将对象传输至客户端
  • 客户端解压对象,校验对象
  • 客户端更新本地 ref (此步骤在上述详细过程当中未有体现,可看本文最后的 fetch 过程当中体现出的 ref 更新)

要想理解这个协议的传输过程,须要对 Git 的底层数据存储原理有一个基本了解,这里稍微作下科普。

Git 有一个说法是:Git 是一个带历史追溯功能的内容寻址系统。听起来貌似比较抽象,可是其实是很容易理解的,Git 底层对于全部版本控制内容的存储分为对象(Object)和引用(Ref)。对象(文件,提交,目录等等)就是存储的实际的数据,引用(分支,标签等等)就是指针。

对象一览:

咱们能够经过 git cat-file -p 来查看一个对象的基本信息。

git cat-file -p fdacba1d541c75bd48f2cd742ee18f77ea3517a1
tree ae0532862af27ecd131a7f792c9156624783d562
parent 1536ad10fc0a188c50680932ca191c8da46938c4
author wzw <wangzhenwei@coding.net> 1526896089 +0800
committer wzw <wangzhenwei@coding.net> 1526896089 +0800

update README.md

能够看到, fdacba1d541c75bd48f2cd742ee18f77ea3517a1 这个对象是一个提交对象,这里列出了他依赖了父提交 1536ad10fc0a188c50680932ca191c8da46938c4 和目录树文件 ae0532862af27ecd131a7f792c9156624783d562 以及他对应的提交做者信息和提交描述

咱们能够追随引用再看下他的父提交

git cat-file -p 1536ad10fc0a188c50680932ca191c8da46938c4
tree f7aa6821aa977f65dc987fe6d6838790371f3d90
author wzw <wangzhenwei@coding.net> 1526895383 +0800
committer wzw <wangzhenwei@coding.net> 1526895383 +0800

Initial commit

他的父提交则是依赖目录树文件 f7aa6821aa977f65dc987fe6d6838790371f3d90 .

咱们来看下目录树文件:

git cat-file -p f7aa6821aa977f65dc987fe6d6838790371f3d90
100644 blob 3aed7e951e0457a2784ff6cd009412e07a09e362    README.md

能够看到目录下有一个 blob 对象, ID 是 3aed7e951e0457a2784ff6cd009412e07a09e362, 咱们来看一下它:

git cat-file -p 3aed7e951e0457a2784ff6cd009412e07a09e362
#coding-demo

咱们能够看到,这个内容是 README.md 文件的第一个版本内容,即其内容对应了 1536ad10fc0a188c50680932ca191c8da46938c4 这个版本。

整体下来, Git 的内部存储结构是这样的:

图片

好,基础知识补充完毕,有没有发现火爆的区块链在技术层面上跟 Git 的存储是有类似之处的 :)

在 Clone 过程当中,服务器端首先会推荐给客户端一些 ref 列表,这也是 Git v2 协议号称的性能改进的地方,后文有解释。

像这样:

17:49:19.772436 pkt-line.c:80           packet:        clone< fdacba1d541c75bd48f2cd742ee18f77ea3517a1 refs/heads/master
17:49:19.772527 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
17:49:19.772549 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
17:49:19.772566 pkt-line.c:80           packet:        clone< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
17:49:19.772863 pkt-line.c:80           packet:        clone< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}

很显然,上文中的 40 位16进制数字就是对应后面的 ref 指向的对象 ID。

而客户端,只须要依据本身感兴趣的 ref 和本身本地已经存在的对象库(对于 pull 和 fetch 来说,本地有对象库,对于 clone 来说本地尚未对象库,那么他就是须要全部的感兴趣的对象)。

在客户端计算完毕本身感兴趣的对象列表后,会用 want 指令告诉远端服务器。

17:49:19.776185 pkt-line.c:80           packet:        clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1 multi_ack_detailed side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
17:49:19.776215 pkt-line.c:80           packet:        clone> want fdacba1d541c75bd48f2cd742ee18f77ea3517a1
17:49:19.776224 pkt-line.c:80           packet:        clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776232 pkt-line.c:80           packet:        clone> want 1536ad10fc0a188c50680932ca191c8da46938c4
17:49:19.776239 pkt-line.c:80           packet:        clone> want 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e

若是客户端执行的是 pull 或者 fetch ,他还会告诉远端本身已经有了什么对象(在文章的后面,咱们会补充一段专门说明此点)。

远端服务器会根据客户端想要的对象以及客户端已经有的对象并对比自身的对象库和对象依赖关系,将客户端必须的对象整理起来并打包压缩传给客户端。

客户端收到对象包后,解包并校验对象,并更新引用的对应指向。

Google 在 Protocol version 2 作了什么

完整的 version 2 的协议说明在这里: https://www.kernel.org/pub/so...

这里咱们对其作的主要改动作些说明,主要有三点:

  • 服务端引用过滤
  • 新特性的易扩展性升级(例如可声明想要什么 ref)
  • 简化的客户端 HTTP 协议处理

被不少标题党夸大其词的主要是其第一点:服务端引用过滤。

Google 官方的博客中对此段的描述是这样的:

The main motivation for the new protocol was to enable server side filtering of references (branches and tags). Prior to protocol v2, servers responded to all fetch commands with an initial reference advertisement, listing all references in the repository. This complete listing is sent even when a client only cares about updating a single branch, e.g.: git fetch origin master. For repositories that contain 100s of thousands of references (the Chromium repository has over 500k branches and tags) the server could end up sending 10s of megabytes of data that get ignored. This typically dominates both time and bandwidth during a fetch, especially when you are updating a branch that's only a few commits behind the remote, or even when you are only checking if you are up-to-date, resulting in a no-op fetch.

We recently rolled out support for protocol version 2 at Google and have seen a performance improvement of 3x for no-op fetches of a single branch on repositories containing 500k references. Protocol v2 has also enabled a reduction of 8x of the overhead bytes (non-packfile) sent from googlesource.com servers. A majority of this improvement is due to filtering references advertised by the server to the refs the client has expressed interest in.

本着实事求是,便利读者的精神,我把这段文字翻译成了中文,以下:

新协议最激动人心的是启用了服务器端过滤引用(分支和标签)。在 V2 协议以前,服务器对于全部 fetch 命令都以一个初始化的建议引用列表做为响应,这会列出仓库中的全部引用。甚至在客户端只关心他想要更新的那一个分支的时候(例如 git fetch origin master)时,引用列表也会被完整地发送到客户端。这对于那些有几十万个引用(Chromium 的源码仓库超过 50万个分支和标签),服务器可能要发送不少客户端彻底忽略掉的内容,这很显然对时间和带宽是一个毫无心义的浪费,尤为是对于那些更新一个只落后于远端几个提交或者你本地的分支本就是最新的,只是执行这个检查更新过程。
咱们最近在 Google 作出了 v2 版本的协议,这使得在一个有50万引用的仓库上更新单个分支的性能有了三倍的提高。这也将 googlesource.com 的非 pack 文件的额外数据传输下降了8倍。这个提高主要是得益于服务器端能够根据客户端声明的感兴趣的引用来过滤引用列表。

读到这里,不少人已经看明白了,原文说的很清楚,性能提高只是在客户端跟服务器端通讯时的第一步,服务器端能够没必要发送全部的 ref 列表。这在一些极端场景下(有几十万分支和标签的仓库),在这个步骤有显著的性能提示。

而事实上,大多数 Git 仓库都不会有这么多 ref,拿示例项目 git@git.coding.net:wzw/coding-demo.git 来讲,这个过程的执行是很是快的:

time git ls-remote git@git.coding.net:wzw/coding-demo.git
fdacba1d541c75bd48f2cd742ee18f77ea3517a1    HEAD
fdacba1d541c75bd48f2cd742ee18f77ea3517a1    refs/heads/master
1536ad10fc0a188c50680932ca191c8da46938c4    refs/heads/test-abc
1536ad10fc0a188c50680932ca191c8da46938c4    refs/heads/test-bcd
30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e    refs/tags/v1.0
1536ad10fc0a188c50680932ca191c8da46938c4    refs/tags/v1.0^{}

real    0m0.103s
user    0m0.020s
sys    0m0.004s

执行过程很快,约耗时 100ms,这仍是包含了 SSH 协议连接创建,认证,数据传输等过程一块儿,对于这个过程而言,耗时主要是花在网络连接,认证过程当中,Git 列出引用列表的过程并非性能瓶颈。

拿 Coding 官方的主开发代码仓库来讲,目前有 2000+ 标签,500+ 分支, 还有约 5000 个合并请求创建的隐藏引用。考虑到 Coding 对仓库有按期 gc,因此有 packed-refs 文件的存在,这个读取和发送过程的确开始变得慢了,可是仍是在可接受范围。

time git ls-remote git@e.coding.net:codingcorp/coding-dev.git
// 中间隐藏去了几千行
5708bacfe2c2510efd0bbb0b4be8268f2a171747    refs/tags/private-1.2
dec93b8774f90c4660bbe8b3759b6d59db30ee45    refs/tags/private-1.2^{}
5ddcbab95eedc1664ac131cddfc51a5d265446ce    refs/tags/release/20160927.1
a91709b7bf08c00fb0b2319aedf999ca7e636109    refs/tags/release/20160927.1^{}
476075fd8442e76d02264f0b109bb2afcb6d39a1    refs/tags/repo-manager-20161118.1
ce29fb126a27f58de555badeb33838d6a3dde8eb    refs/tags/repo-manager-20161118.1^{}
30739025962d6e788f1542841aa509422810853e    refs/tags/test-tag-20180308.1
1a7b7474257badeca9fa0c15204bf5769f42b33a    refs/tags/test-tag-20180308.1^{}

real    0m1.677s
user    0m0.032s
sys    0m0.052s

而 CODING 主开发仓库的单次全新 Clone 的总传输数据量在 550M 左右,以较好的网络带宽,clone 仓库可达 5MB/s 算,也要 110 秒才能所有传输完毕,而这前置的 1.677秒就显得很是微不足道。

这样算来,Google 的此次改动确实给一些大仓库(尤为是一些引用数量特别多的仓库)在一些特定场景下有了一些优化,并算不上是国内的一些媒体夸大其词的大幅性能提高。从传输过程来看,Git 主要的对象依赖关系计算,对象声明协议格式,传输过程并无改变。其号称节省了8倍数据量的非 pack 数据的传输量只占总传输量很小的比例, 整体算下来其确实节省了数据传输量,可是还远远没法达到大幅提高。
固然,咱们仍然要感谢 Googler 对于开源的贡献仍然值得咱们赞扬。看过此文,但愿你们能以一个严谨的态度面对技术,不要人云亦云,Talk is cheap, show me your code!

再扯几句

PS:提及 Git 性能的大幅提高,历史上 Google 工程师在开发 JGit 的时候,贡献过一个 bitmap 索引理念给 Git,使得 Git 在作对象关系依赖解析的时候可使用少许的空间节省大量的树节点遍历,这才是真正性能大幅提高的改进,目前 bitmap index 已是 Git 新版本默认携带的一个功能了,下次有机会再将其原理分享给你们。

PS2: Git 协议中还有不少其余特性,这里为了讲明本文要点,文中没有说起其余特性。

PS3:Git 传输协议中对于本地已经有的对象的声明(have 指令)

GIT_TRACE=1 GIT_TRACE_PACKET=1 git fetch origin master
19:58:08.432172 git.c:344               trace: built-in: git 'fetch' 'origin' 'master'
19:58:08.438917 run-command.c:626       trace: run_command: 'ssh' 'git@git.coding.net' 'git-upload-pack '\''wzw/coding-demo.git'\'''
Warning: Permanently added the RSA host key for IP address '123.59.85.127' to the list of known hosts.
19:58:09.634163 pkt-line.c:80           packet:        fetch< 8dccad22648e94c52335a7266c7cff5d947c9532 HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/master agent=git/2.15.0
19:58:09.641777 pkt-line.c:80           packet:        fetch< 8dccad22648e94c52335a7266c7cff5d947c9532 refs/heads/master
19:58:09.641846 pkt-line.c:80           packet:        fetch< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-abc
19:58:09.641872 pkt-line.c:80           packet:        fetch< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/heads/test-bcd
19:58:09.641891 pkt-line.c:80           packet:        fetch< 30eb4b0d813c662c4d7e87c4d3b4cf561e544f8e refs/tags/v1.0
19:58:09.641903 pkt-line.c:80           packet:        fetch< 1536ad10fc0a188c50680932ca191c8da46938c4 refs/tags/v1.0^{}
19:58:09.641913 pkt-line.c:80           packet:        fetch< 0000
19:58:09.642105 run-command.c:626       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
19:58:09.655120 pkt-line.c:80           packet:        fetch> want 8dccad22648e94c52335a7266c7cff5d947c9532 multi_ack_detailed side-band-64k thin-pack ofs-delta deepen-since deepen-not agent=git/2.15.1.(Apple.Git-101)
19:58:09.655157 pkt-line.c:80           packet:        fetch> 0000
19:58:09.655190 pkt-line.c:80           packet:        fetch> have fdacba1d541c75bd48f2cd742ee18f77ea3517a1
19:58:09.655207 pkt-line.c:80           packet:        fetch> have 1536ad10fc0a188c50680932ca191c8da46938c4
19:58:09.655221 pkt-line.c:80           packet:        fetch> done
19:58:09.975282 pkt-line.c:80           packet:        fetch< ACK fdacba1d541c75bd48f2cd742ee18f77ea3517a1 common
19:58:09.975382 pkt-line.c:80           packet:        fetch< ACK 1536ad10fc0a188c50680932ca191c8da46938c4 common
19:58:09.975404 pkt-line.c:80           packet:        fetch< ACK 1536ad10fc0a188c50680932ca191c8da46938c4
19:58:09.975728 pkt-line.c:80           packet:     sideband< \2Counting objects: 3, done.
remote: Counting objects: 3, done.
19:58:09.975763 pkt-line.c:80           packet:     sideband< \2Compressing objects:  50% (1/2)   \15Compressing objects: 100% (2/2)   \15
19:58:09.975798 pkt-line.c:80           packet:     sideband< \2Compressing objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
19:58:10.065650 pkt-line.c:80           packet:     sideband< PACK ...
19:58:10.065707 pkt-line.c:80           packet:     sideband< \2Total 3 (delta 0), reused 0 (delta 0)
remote: Total 3 (delta 0), reused 0 (delta 0)
19:58:10.065714 run-command.c:626       trace: run_command: 'unpack-objects' '--pack_header=2,3'
19:58:10.065741 pkt-line.c:80           packet:     sideband< 0000
19:58:10.071004 git.c:344               trace: built-in: git 'unpack-objects' '--pack_header=2,3'
Unpacking objects: 100% (3/3), done.
19:58:10.317201 run-command.c:626       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
19:58:10.322159 git.c:344               trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
From git.coding.net:wzw/coding-demo
 * branch            master     -> FETCH_HEAD
   fdacba1..8dccad2  master     -> origin/master
19:58:10.328515 run-command.c:1452      run_processes_parallel: preparing to run up to 1 tasks
19:58:10.328564 run-command.c:1484      run_processes_parallel: done
19:58:10.328621 run-command.c:626       trace: run_command: 'gc' '--auto'
19:58:10.333115 git.c:344               trace: built-in: git 'gc' '--auto'

本文参考资料

相关文章
相关标签/搜索