Protobuf语言指南

什么是 Protobuf

Protobuf是Protocol Buffers的简称，它是Google公司开发的一种数据描述语言，用于描述一种轻便高效的结构化数据存储格式，并于2008年对外开源。Protobuf能够用于结构化数据串行化，或者说序列化。它的设计很是适用于在网络通信中的数据载体，很适合作数据存储或 RPC 数据交换格式，它序列化出来的数据量少再加上以 K-V 的方式来存储数据，对消息的版本兼容性很是强，可用于通信协议、数据存储等领域的语言无关、平台无关、可扩展的序列化结构数据格式。开发者能够经过Protobuf附带的工具生成代码并实现将结构化数据序列化的功能。php

Protobuf中最基本的数据单元是message，是相似Go语言中结构体的存在。在message中能够嵌套message或其它的基础数据类型的成员。html

教程中将描述如何用protocol buffer语言构造你的protocol buffer数据，包括.proto文件的语法以及如何经过.proto文件生成数据访问类。教程中使用的是proto3版本的protocol buffer语言。java

定义Message

首先看一个简单的例子，好比说你定义一个搜索请求的message，每个搜索请求会包含一个搜索的字符串，返回第几页的结果，以及结果集的大小。在.proto文件中定义以下：python

syntax = "proto3";

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}
复制代码

.proto文件的第一行指定了使用proto3语法。若是省略protocol buffer编译器默认使用proto2语法。他必须是文件中非空非注释行的第一行。
SearchRequest定义中指定了三个字段(name/value键值对)，每一个字段都会有名称和类型。

指定字段类型

上面的例子中，全部的字段都是标量类型的两个整型(page_number和result_per_page)和一个字符串型(query)。不过你还能够给字段指定复合类型，包括枚举类型和其余message类型git

指定字段编号

在message定义中每一个字段都有一个惟一的编号，这些编号被用来在二进制消息体中识别你定义的这些字段，一旦你的message类型被用到后就不该该在修改这些编号了。注意在将message编码成二进制消息体时字段编号1-15将会占用1个字节，16-2047将占用两个字节。因此在一些频繁使用用的message中，你应该老是先使用前面1-15字段编号。github

你能够指定的最小编号是1，最大是2E29 - 1（536,870,911）。其中19000到19999是给protocol buffers实现保留的字段标号，定义message时不能使用。一样的你也不能重复使用任何当前message定义里已经使用过和预留的字段编号。golang

定义字段的规则

message的字段必须符合如下规则：objective-c

singular：一个遵循singular规则的字段，在一个结构良好的message消息体(编码后的message)能够有0或1个该字段（可是不能够有多个）。这是proto3语法的默认字段规则。（这个理解起来有些晦涩，举例来讲上面例子中三个字段都是singular类型的字段，在编码后的消息体中能够有0或者1个query字段，但不会有多个。）
repeated：遵循repeated规则的字段在消息体重能够有任意多个该字段值，这些值的顺序在消息体重能够保持（就是数组类型的字段）

添加更多消息类型

在单个.proto文件中能够定义多个message，这在定义多个相关message时很是有用。好比说，咱们定义SearchRequest对应的响应message SearchResponse ,把它加到以前的.proto文件中。编程

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

message SearchResponse {
 ...
}
复制代码

添加注释

.proto文件中的注释和C，C++的注释风格相同，使用// 和 /* ... */json

/* SearchRequest represents a search query, with pagination options to * indicate which results to include in the response. */

message SearchRequest {
  string query = 1;
  int32 page_number = 2;  // Which page number do we want?
  int32 result_per_page = 3;  // Number of results to return per page.
}
复制代码

保留字段

当你删掉或者注释掉message中的一个字段时，将来其余开发者在更新message定义时就能够重用以前的字段编号。若是他们意外载入了老版本的.proto文件将会致使严重的问题，好比数据损坏、隐私泄露等。一种避免问题发生的方式是指定保留的字段编号和字段名称。若是将来有人用了这些字段标识那么在编译时protocol buffer的编译器会报错。

message Foo {
  reserved 2, 15, 9 to 11;
  reserved "foo", "bar";
}
复制代码

proto会生成什么代码

当使用protocol buffer编译器编译.proto文件时，编译器会根据你在.proto文件中定义的message类型生成指定编程语言的代码。生成的代码包括访问和设置字段值、格式化message类型到输出流，从输入流解析出message等。

For C++, the compiler generates a .h and .cc file from each .proto, with a class for each message type described in your file.
For Java, the compiler generates a .java file with a class for each message type, as well as a special Builderclasses for creating message class instances.
Python is a little different – the Python compiler generates a module with a static descriptor of each message type in your .proto, which is then used with a metaclass to create the necessary Python data access class at runtime.
For Go, the compiler generates a .pb.go file with a type for each message type in your file.
For Ruby, the compiler generates a .rb file with a Ruby module containing your message types.
For Objective-C, the compiler generates a pbobjc.h and pbobjc.m file from each .proto, with a class for each message type described in your file.
For C#, the compiler generates a .cs file from each .proto, with a class for each message type described in your file.
For Dart, the compiler generates a .pb.dart file with a class for each message type in your file.

标量类型

.proto Type	Notes	C++ Type	Java Type	Python Type[2]	Go Type	Ruby Type	C# Type	PHP Type	Dart Type
double		double	double	float	float64	Float	double	float	double
float		float	float	float	float32	Float	float	float	double
int32	使用可变长度编码。编码负数的效率低 - 若是您的字段可能有负值，请改用sint32。	int32	int	int	int32	Fixnum or Bignum (as required)	int	integer	int
int64	使用可变长度编码。编码负数的效率低 - 若是您的字段可能有负值，请改用sint64。	int64	long	int/long[3]	int64	Bignum	long	integer/string[5]	Int64
uint32	使用可变长度编码	uint32	int	int/long	uint32	Fixnum or Bignum (as required)	uint	integer	int
uint64	使用可变长度编码.	uint64	long	int/long	uint64	Bignum	ulong	integer/string[5]	Int64
sint32	使用可变长度编码。签名的int值。这些比常规int32更有效地编码负数。	int32	int	int	int32	Fixnum or Bignum (as required)	int	integer	int
sint64	使用可变长度编码。签名的int值。这些比常规int64更有效地编码负数。	int64	long	int/long	int64	Bignum	long	integer/string[5]	Int64
fixed32	老是四个字节。若是值一般大于228，则比uint32更有效。	uint32	int	int/long	uint32	Fixnum or Bignum (as required)	uint	integer	int
fixed64	老是八个字节。若是值一般大于256，则比uint64更有效	uint64	long	int/long[3]	uint64	Bignum	ulong	integer/string[5]	Int64
sfixed32	老是四个字节	int32	int	int	int32	Fixnum or Bignum (as required)	int	integer	int
sfixed64	老是八个字节	int64	long	int/long	int64	Bignum	long	integer/string[5]	Int64
bool		bool	boolean	bool	bool	TrueClass/FalseClass	bool	boolean	bool
string	字符串必须始终包含UTF-8编码或7位ASCII文本，且不能超过232。	string	String	str/unicode	string	String (UTF-8)	string	string	String
bytes	能够包含不超过232的任意字节序列。	string	ByteString	str	[]byte	String (ASCII-8BIT)	ByteString	string	List

默认值

当时一个被编码的message体中不存在某个message定义中的singular字段时，在message体解析成的对象中，相应字段会被设置为message定义中该字段的默认值。默认值依类型而定：

对于字符串，默认值为空字符串。
对于字节，默认值为空字节。
对于bools，默认值为false。
对于数字类型，默认值为零。
对于枚举，默认值是第一个定义的枚举值，该值必须为0。
对于消息字段，未设置该字段。它的确切值取决于语言。有关详细信息，请参阅代码生成指南。

枚举类型

在定义消息类型时，您可能但愿其中一个字段只有一个预约义的值列表中的值。例如，假设您要为每一个SearchRequest添加corpus字段，其中corpus能够是UNIVERSAL，WEB，IMAGES，LOCAL，NEWS，PRODUCTS或VIDEO。您能够很是简单地经过向消息定义添加枚举，并为每一个可能的枚举值值添加常量来实现。

在下面的例子中，咱们添加了一个名为Corpus的枚举类型，和一个Corpus类型的字段：

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
  enum Corpus {
    UNIVERSAL = 0;
    WEB = 1;
    IMAGES = 2;
    LOCAL = 3;
    NEWS = 4;
    PRODUCTS = 5;
    VIDEO = 6;
  }
  Corpus corpus = 4;
}
复制代码

如你所见，Corpus枚举的第一个常量映射到了0：全部枚举定义都须要包含一个常量映射到0而且做为定义的首行，这是由于：

必须有0值，这样咱们就能够将0做为枚举的默认值。
proto2语法中首行的枚举值老是默认值，为了兼容0值必须做为定义的首行。

使用其余Message类型

可使用其余message类型做为字段的类型，假设你想在每一个SearchResponse消息中携带类型为Result的消息，

你能够在同一个.proto文件中定义一个Result消息类型，而后在SearchResponse中指定一个Result类型的字段。

message SearchResponse {
  repeated Result results = 1;
}

message Result {
  string url = 1;
  string title = 2;
  repeated string snippets = 3;
}
复制代码

导入消息定义

在上面的示例中，Result消息类型在与SearchResponse相同的文件中定义 - 若是要用做字段类型的消息类型已在另外一个.proto文件中定义，该怎么办？

您能够经过导入来使用其余.proto文件中的定义。要导入另外一个.proto的定义，请在文件顶部添加一个import语句：

import "myproject/other_protos.proto";
复制代码

默认状况下，您只能使用直接导入的.proto文件中的定义。可是，有时你可能须要将.proto文件移动到新位置。如今，你能够在旧位置放置一个虚拟.proto文件，在文件中使用import public语法将全部导入转发到新位置，而不是直接移动.proto文件并在一次更改中更新全部调用点。任何导入包含import public语句的proto文件的人均可以传递依赖导入公共依赖项。例如

// new.proto
// All definitions are moved here
复制代码

// old.proto
// This is the proto that all clients are importing.
import public "new.proto";
import "other.proto";
复制代码

// client.proto
import "old.proto";
// You use definitions from old.proto and new.proto, but not other.proto
复制代码

编译器会在经过命令行参数-I或者--proto-path中指定的文件夹中搜索.proto文件，若是没有提供编译器会在唤其编译器的目录中进行搜索。一般来讲你应该将--proto-path的值设置为你项目的根目录，并对全部导入使用彻底限定名称。

使用proto2的消息类型

能够导入proto2版本的消息类型到proto3的消息类型中使用，固然也能够在proto2消息类型中导入proto3的消息类型。可是proto2的枚举类型不能直接应用到proto3的语法中。

嵌套消息类型

消息类型能够被定义和使用在其余消息类型中，下面的例子里Result消息被定义在SearchResponse消息中

message SearchResponse {
  message Result {
    string url = 1;
    string title = 2;
    repeated string snippets = 3;
  }
  repeated Result results = 1;
}
复制代码

若是你想在外部使用定义在父消息中的子消息，使用Parent.Type引用他们

message SomeOtherMessage {
  SearchResponse.Result result = 1;
}
复制代码

你能够嵌套任意多层消息

message Outer {                  // Level 0
  message MiddleAA {  // Level 1
    message Inner {   // Level 2
      int64 ival = 1;
      bool  booly = 2;
    }
  }
  message MiddleBB {  // Level 1
    message Inner {   // Level 2
      int32 ival = 1;
      bool  booly = 2;
    }
  }
}
复制代码

更新Message

若是一个现存的消息类型再也不知足你当前的需求--好比说你但愿在消息中增长一个额外的字段--可是仍想使用由旧版的消息格式生成的代码，不用担忧！只要记住下面的规则，在更新消息定义的同时又不破坏现有的代码就很是简单。

不要更改任何已存字段的字段编号。
若是添加了新字段，任何由旧版消息格式生成的代码所序列化的消息，仍能被依据新消息格式生成的代码所解析。你应该记住这些元素的默认值这些新生成的代码就可以正确地与由旧代码序列化建立的消息交互了。相似的，新代码建立的消息也能由旧版代码解析：旧版消息（二进制）在解析时简单地忽略了新增的字段，查看下面的未知字段章节了解更多。
只要在更新后的消息类型中再也不重用字段编号，就能够删除该字段。你也能够重命名字段，好比说添加OBSOLETE_前缀或者将字段编号设置为reserved，这些将来其余用户就不会意外地重用该字段编号了。

未知字段

未知字段是格式良好的协议缓冲区序列化数据，表示解析器没法识别的字段。例如，当旧二进制文件解析具备新字段的新二进制文件发送的数据时，这些新字段将成为旧二进制文件中的未知字段。

最初，proto3消息在解析期间老是丢弃未知字段，但在3.5版本中，咱们从新引入了未知字段的保留以匹配proto2行为。在版本3.5及更高版本中，未知字段在解析期间保留，并包含在序列化输出中。

映射类型

若是你想建立一个映射做为message定义的一部分，protocol buffers提供了一个简易便利的语法

map<key_type, value_type> map_field = N;
复制代码

key_type能够是任意整数或者字符串（除了浮点数和bytes之外的全部标量类型）。注意enum不是一个有效的key_type。value_type能够是除了映射之外的任意类型（意思是protocol buffers的消息体中不容许有嵌套map）。

举例来讲，假如你想建立一个名为projects的映射，每个Project消息关联一个字符串键，你能够像以下来定义：

map<string, Project> projects = 3;
复制代码

映射里的字段不能是follow repeated规则的（意思是映射里字段的值不能是数组）。
映射里的值是无序的，因此不能依赖映射里元素的顺序。
生成.proto的文本格式时，映射按键排序。数字键按数字排序。
从线路解析或合并时，若是有重复的映射键，则使用最后看到的键。从文本格式解析映射时，若是存在重复键，则解析可能会失败。
若是未给映射的字段指定值，字段被序列化时的行为依语言而定。在C++， Java和Python中字段类型的默认值会被序列化做为字段值，而其余语言则不会。

给Message加包名

你能够在.proto文件中添加一个可选的package符来防止消息类型以前的名称冲突。

package foo.bar;
message Open { ... }
复制代码

在定义message的字段时像以下这样使用package名称

message Foo {
  ...
  foo.bar.Open open = 1;
  ...
}
复制代码

package符对生成代码的影响视编程语言而定

定义服务

若是想消息类型与RPC（远程过程调用）系统一块儿使用，你能够在.proto文件中定义一个RPC服务接口，而后protocol buffer编译器将会根据你选择的编程语言生成服务接口代码和stub，加入你要定义一个服务，它的一个方法接受SearchRequest消息返回SearchResponse消息，你能够在.proto文件中像以下示例这样定义它：

service SearchService {
  rpc Search (SearchRequest) returns (SearchResponse);
}
复制代码

与protocol buffer 一块儿使用的最简单的RPC系统是gRPC：一种由Google开发的语言和平台中立的开源RPC系统。 gRPC特别适用于protocol buffer，并容许您使用特殊的protocol buffer编译器插件直接从.proto文件生成相关的RPC代码。

若是你不想使用gRPC，可使用本身实现的RPC系统，更多关于实现RPC系统的细节能够在Proto2 Language Guide中找到。

JSON编解码

Proto3支持JSON中的规范编码，使得在系统之间共享数据变得更加容易。在下表中逐个类型地列出了编码规则。

若是JSON编码数据中缺乏某个值，或者其值为null，则在解析为protocol buffer时，它将被解释为相应的默认值。若是字段在protocol buffer中具备默认值，则默认状况下将在JSON编码的数据中省略该字段以节省空间。编写编解码实现能够覆盖这个默认行为在JSON编码的输出中保留具备默认值的字段的选项。

proto3	JSON	JSON example	Notes
message	object	`{"fooBar": v, "g": null,…}`	生成JSON对象。消息字段名称会被转换为小驼峰并成为JSON对象键。若是指定了`json_name`字段选项，则将指定的值用做键。解析器接受小驼峰名称（或由`json_name`选项指定的名称）和原始proto字段名称。 `null`是全部字段类型的可接受值，并被视为相应字段类型的默认值。
enum	string	`"FOO_BAR"`	使用proto中指定的枚举值的名称。解析器接受枚举名称和整数值。
map<K,V>	object	`{"k": v, …}`	全部键都将被转换为字符串
repeated V	array	`[v, …]`	null会被转换为空列表[]
bool	true, false	`true, false`
string	string	`"Hello World!"`
bytes	base64 string	`"YWJjMTIzIT8kKiYoKSctPUB+"`	JSON值将是使用带填充的标准base64编码编码为字符串的数据。接受带有/不带填充的标准或URL安全base64编码。
int32, fixed32, uint32	number	`1, -10, 0`	JSON value will be a decimal number. Either numbers or strings are accepted.
int64, fixed64, uint64	string	`"1", "-10"`	JSON value will be a decimal string. Either numbers or strings are accepted.
float, double	number	`1.1, -10.0, 0, "NaN","Infinity"`	JSON value will be a number or one of the special string values "NaN", "Infinity", and "-Infinity". Either numbers or strings are accepted. Exponent notation is also accepted.
Any	`object`	`{"@type": "url", "f": v, … }`	If the Any contains a value that has a special JSON mapping, it will be converted as follows: `{"@type": xxx, "value": yyy}`. Otherwise, the value will be converted into a JSON object, and the `"@type"` field will be inserted to indicate the actual data type.
Timestamp	string	`"1972-01-01T10:00:20.021Z"`	Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted.
Duration	string	`"1.000340012s", "1s"`	Generated output always contains 0, 3, 6, or 9 fractional digits, depending on required precision, followed by the suffix "s". Accepted are any fractional digits (also none) as long as they fit into nano-seconds precision and the suffix "s" is required.
Struct	`object`	`{ … }`	Any JSON object. See `struct.proto`.
Wrapper types	various types	`2, "2", "foo", true,"true", null, 0, …`	Wrappers use the same representation in JSON as the wrapped primitive type, except that `null` is allowed and preserved during data conversion and transfer.
FieldMask	string	`"f.fooBar,h"`	See `field_mask.proto`.
ListValue	array	`[foo, bar, …]`
Value	value		Any JSON value
NullValue	null		JSON null
Empty	object	{}	An empty JSON object

生成代码

要生成Java，Python，C ++，Go，Ruby，Objective-C或C＃代码，你须要使用.proto文件中定义的消息类型，你须要在.proto上运行protocol buffer编译器protoc。若是还没有安装编译器，请下载该软件包并按照README文件中的说明进行操做。对于Go，还须要为编译器安装一个特殊的代码生成器插件：你能够在GitHub上的golang/protobuf项目中找到这个插件和安装说明。

编译器像下面这样唤起：

protoc --proto_path=IMPORT_PATH --cpp_out=DST_DIR --java_out=DST_DIR --python_out=DST_DIR --go_out=DST_DIR --ruby_out=DST_DIR --objc_out=DST_DIR --csharp_out=DST_DIR path/to/file.proto
复制代码

IMPORT_PATH指定了在解析import命令时去哪里搜索.proto文件，若是忽略将在当前工做目录进行查找，能够经过传递屡次--proto-path参数来指定多个import目录，他们将会按顺序被编译器搜索。-I=IMPORT_PATH是--proto_path的简短形式。
你能够提供一个或多个输出命令：
- --cpp_out generates C++ code in DST_DIR. See the C++ generated code reference for more.
- --java_out generates Java code in DST_DIR. See the Java generated code reference for more.
- --python_out generates Python code in DST_DIR. See the Python generated code reference for more.
- --go_out generates Go code in DST_DIR. See the Go generated code reference for more.
- --ruby_out generates Ruby code in DST_DIR. Ruby generated code reference is coming soon!
- --objc_out generates Objective-C code in DST_DIR. See the Objective-C generated code reference for more.
- --csharp_out generates C# code in DST_DIR. See the C# generated code reference for more.
- --php_out generates PHP code in DST_DIR. See the PHP generated code reference for more.
必须提供一个或多个.proto文件做为输入。能够一次指定多个.proto文件。虽然文件是相对于当前目录命名的，但每一个文件必须存在于其中一个IMPORT_PATH中，以便编译器能够肯定其规范名称。