Google’s Protocol Buffers is a structured way to serialize data.
Think JSON, on steroids.

We are going to discuss advanced Protobuf usage. In this article:
dynamic messages & descriptor pools

Table of Contents

Extending Protobuf: dynamic messages

About Protobuf

This article tackles advanced Protobuf topics, so one should be comfortable with the basics before reading this.
We’re going to refer to the C++ implementation, as one of the most stable and complete, however, most implementations in other languages should provide the same features.

Descriptors and Messages

Before going futher, it’s important to understand Protobuf own notation regarding objects.

Let’s make a distinction between Message and Descriptor.

Message

First of all, Message. You’ve met this many times, however, it is likely you never called it directly.

Message is an abstract interface, but whenever you call protoc the generated classes will subclass it, hence the frequent indirect usage.

Descriptor

Descriptor, as the name suggests, describes messages.
Again, think of protoc: once it effectively parses the .proto files, it will create a Descriptor for each message.

With this in mind, it should be clear when we need Descriptors or Messages. When dealing with actual objects filled with data, Message can be used (hand in hand with reflection).
When message definitions are unknown at compile-time, and should be generated at run-time, Descriptor does the job.

From Descriptor to Message

There are a few steps to be taken to get from Descriptor to Message.
Note that we’re only going to show a very limited example: Protobuf is very flexible, and many different paths can be taken.

Our example is inspired by a project of mine, ProfaneDB, which makes heavy use of Protobuf.

DescriptorDatabase

Descriptors usually make sense when they’re part of a FileDescriptor.
DescriptorDatabase takes care of loading multiple files and allows easy retrieval.

Many implementations are provided for this, however, we’re going to focus on SourceTreeDescriptorDatabase. This closely resembles what protoc might do, loading .proto files from disk.

DiskSourceTree is used to configure mappings, but most times paths will be mapped to root.

ProfaneDB provides a simple class to map multiple paths to root.

// RootSourceTree is a utility class that creates a DiskSourceTree,
// mapping all the paths provided to the root ("/") path for easier import.
RootSourceTree::RootSourceTree(std::initializer_list<path> paths)
  : paths(paths)
{
    if (paths.size() == 0)
        throw std::runtime_error("Mapping is empty");

    for (const auto & path: paths) {
        this->MapPath("", path.string());

        BOOST_LOG_TRIVIAL(debug) << "Mapping " << path.string();
    }

    ZeroCopyInputStream * inputStream = this->Open("");
    if (inputStream == nullptr)
        throw std::runtime_error(this->GetLastErrorMessage());
}

Consider for instance, apart from your own .proto files folder, you might want to map
"/usr/include" to ""
meaning you can now import "google/protobuf/any.proto".

With your DiskSourceTree, you can now create and populate a SourceTreeDescriptorDatabase.

Most of the time though, you don’t want to use it directly, rather wrap the database in a DescriptorPool.

DescriptorPool

DescriptorDatabase simply retrieves FileDescriptorProto.
These aren’t event proper FileDescriptors, rather a description of a FileDescriptor, defined as a Protobuf object. Too meta, we’ll see this at the end of the article.

Instead, using DescriptorPool allows to construct FileDescriptor, Descriptor, FieldDescriptor, ServiceDescriptor, … you name it! It also takes care of importing dependencies.

Note that using SourceTreeDescriptorDatabase as a fallback database for a DescriptorPool has a drawback:
all files have to be loaded using FindFileByName before all the other cool functions can be used.
You can see this in action here.

MessageFactory

From DynamicMessageFactory documentation:

Constructs implementations of Message which can emulate types which are not known at compile-time.

Sometimes you want to be able to manipulate protocol types that you don’t know about at compile time. It would be nice to be able to construct a Message object which implements the message type given by any arbitrary Descriptor. DynamicMessage provides this.

Having generated our Descriptor from file, one can now get to the last step: creating a dynamic message.

std::string typeName = "fully.qualified.typename";
google::protobuf::DescriptorPool descriptorPool(descriptorDb);
google::protobuf::DynamicMessageFactory dynamicMessageFactory(descriptorPool);

dynamicMessageFactory.GetPrototype(
  descriptorPool.FindMessageTypeByName(
    typeName
  ));

From a prototype of a message, you can now get an editable instance by calling ->New().

ProfaneDB does this at various times, and you can look at the code to see how it’s done.

Manipulating Descriptors

We’ve seen the power of Descriptor, by loading them from .proto file, thus avoiding the whole protoc compilation step.

Now what if we want to edit or even generate Descriptor at runtime?

FileDescriptorProto, DescriptorProto, etc.

We’ve mentioned how DescriptorDatabase returns FileDescriptorProto.
Since by now you’ve great understanding of Protobuf syntax, you can give a look at how FileDescriptorProto (and similar) are defined.

FileDescriptorProto are a description of a FileDescriptor. As expected, DescriptorProto describes a Descriptor and so on with FieldDescriptorProto, OneofDescriptorProto, ServiceDescriptorProto, etc.

To make changes at run-time, one can simply edit those as they are simply Protobuf messages.
In this case, code says more than a thousand words.
Following ProfaneDB ParseFile and ParseAndNormalizeDescriptor you can see how Descriptors are edited.

A summary is outlined here:

google::protobuf::FileDescriptor * fileDescriptor;
google::protobuf::FileDescriptorProto * normalizedProto = new FileDescriptorProto;

// Copying a FileDescriptor to a FileDescriptorProto
fileDescriptor->CopyTo(normalizedProto);

// TODO Make changes to the FileDescriptorProto here

// For instance
// add an `import` statement
*normalizedProto->add_dependency() = "profanedb/protobuf/storage.proto";

google::protobuf::SimpleDescriptorDatabase normalizedDescriptorDb;
// Add it to a SimpleDescriptorDatabase,
// you can then wrap this in a DescriptorPool and use it as shown earlier
normalizedDescriptorDb.AddAndOwn(normalizedProto);

Other articles regarding Protobuf will follow, with topics such as reflection, file structure and CI. Write a comment to give some feedback, suggestions or requests, it will be highly appreciated!