Protobuf动态解析那些事儿

时间 2019-11-12

原文原文链接

需求背景

在接收到 protobuf 数据以后，如何自动建立具体的 Protobuf Message 对象，再作反序列化。“自动”的意思主要有两个方面：（1）当程序中新增一个 protobuf Message 类型时，这部分代码不须要修改，不须要本身去注册消息类型，不须要重启进程，只须要提供protobuf文件；（2）当protobuf Message修改后，这部分代码不须要修改，不须要本身去注册消息类型，不须要重启进程只须要提供修改后protobuf文件。html

技术介绍

Protobuf的入门能够参考 Google Protocol Buffer 的在线帮助网页或者IBM developerwor上的文章《Google Protocol Buffer 的使用和原理》。linux

protobuf的动态解析在google protobuf buffer官网并无什么介绍。经过google出的一些参考文档能够知道，其实，Google Protobuf 自己具备很强的反射(reflection)功能，能够根据 type name 建立具体类型的 Message 对象，咱们直接利用便可，应该就能够知足上面的需求。网络

实现能够参考淘宝的文章《玩转Protocol Buffers 》，里面对protobuf的动态解析的原理作了详细的介绍，在此我介绍一下Protobuf class diagram。函数

你们一般关心和使用的是图的左半部分：MessageLite、Message、Generated Message Types (Person, AddressBook) 等，而较少注意到图的右半部分：Descriptor, DescriptorPool, MessageFactory。性能

上图中，其关键做用的是 Descriptor class，每一个具体 Message Type 对应一个 Descriptor 对象。尽管咱们没有直接调用它的函数，可是Descriptor在“根据 type name 建立具体类型的 Message 对象”中扮演了重要的角色，起了桥梁做用。上图的红色箭头描述了根据 type name 建立具体 Message 对象的过程。测试

实现

先直接上代码，这个代码来自于《玩转Protocol Buffers 》：ui

#include <google/protobuf/descriptor.h>google

#include <google/protobuf/descriptor.pb.h>spa

#include <google/protobuf/dynamic_message.h>.net

#include <google/protobuf/compiler/importer.h>

using namespace google::protobuf;

using namespace google::protobuf::compiler;

int main(int argc,const char *argv[])

DiskSourceTree sourceTree;

//look up .proto file in current directory

sourceTree.MapPath("","./");

Importer importer(&sourceTree, NULL);

//runtime compile foo.proto

importer.Import("foo.proto");

const Descriptor *descriptor = importer.pool()->

FindMessageTypeByName("Pair");

cout << descriptor->DebugString();

// build a dynamic message by "Pair" proto

DynamicMessageFactory factory;

const Message *message = factory.GetPrototype(descriptor);

// create a real instance of "Pair"

Message *pair = message->New();

// write the "Pair" instance by reflection

const Reflection *reflection = pair->GetReflection();

const FieldDescriptor *field = NULL;

field = descriptor->FindFieldByName("key");

reflection->SetString(pair, field,"my key");

field = descriptor->FindFieldByName("value");

reflection->SetUInt32(pair, field, 1111);

cout << pair->DebugString();

那咱们就来看看上面的代码

1 ）把本地地址映射为虚拟地址

DiskSourceTree sourceTree;

//look up .proto file in current directory

sourceTree.MapPath("","./");

2 ）构造 DescriptorPool

Importer importer(&sourceTree, NULL);

//runtime compile foo.proto

importer.Import("foo.proto");

3 ）获取 Descriptor

const Descriptor *descriptor = importer.pool()->FindMessageTypeByName("Pair");

4 ）经过 Descriptor获取Message

const Message *message = factory.GetPrototype(descriptor);

5 ）根据类型信息使用 DynamicMessage new 出这个类型的一个空对象

Message *pair = message->New();

6 ）经过 Message 的 reflection 操做 message 的各个字段

const Reflection *reflection = pair->GetReflection();

const FieldDescriptor *field = NULL;

field = descriptor->FindFieldByName("key");

reflection->SetString(pair, field,"my key");

field = descriptor->FindFieldByName("value");

reflection->SetUInt32(pair, field, 1111);

直接copy上面代码看起来咱们上面的需求就知足了，只是惟一的缺点就是每次来个包加载一次配置文件，当时以为性能应该和读取磁盘的性能差很少，可是通过测试性能极差，一个进程每秒尽能够处理1000多个包，通过分析性能瓶颈不在磁盘，而在频繁调用malloc和free上。

看来咱们得从新考虑实现，初步的实现想法：只有protobuf描述文件更新时再从新加载，没有更新来包只须要使用加载好的解析就能够。这个方案看起来挺好的，性能应该不错，通过测试，性能确实能够，每秒能够处理3万左右的包，可是实现中遇到了困难。要更新原来的Message，必须更新Importer和Factory，那么要更新这些东西，就涉及到了资源的释放。通过研究这些资源的释放顺序特别重要，下面就介绍一下protobuf相关资源释放策略。

动态的Message是咱们用DynamicMessageFactory构造出来的，所以销毁Message必须用同一个DynamicMessageFactory。动态更新.proto文件时，咱们销毁老的并使用新的DynamicMessageFactory，在销毁DynamicMessageFactory以前，必须先删除全部通过它构造的Message。

原理：DynamicMessageFactory里面包含DynamicMessage的共享信息，析构DynamicMessage时须要用到。生存期必须保持Descriptor>DynamicMessageFactory>DynamicMessage。

释放顺序必须是：释放全部DynamicMessage，释放DynamicMessageFactory，释放Importer。