Google’s Protocol Buffers is a structured way to serialize data.
Think JSON, on steroids.
We are going to discuss advanced Protobuf usage.
Table of Contents
Extending Protobuf: custom options
Why Protobuf
Before reading this article, you should get comfortable with the ideas behind Protocol Buffers and how they are implemented in your language of choice.
We are going to focus on C++, since this is one of the most stable interfaces and provides access to all of Protobuf features.
ProfaneDB, our sample project
The code shown here can be seen in action in my project, ProfaneDB, written in C++ (overview of Protobuf C++ API).
The purpose of ProfaneDB is to store Protobuf messages in a key-value database (namely RocksDB), avoiding duplication, and providing an easy interface to retrieve objects.
This is done in two steps:
- Define a “schema”. When it comes to a KV database, what we really need is just a key.
- Save nested messages and store a reference to their key.
// This message can be stored and retrieved:
// it has a key which will identify it uniquely
message ParentMessage {
string unique_key = 1 [ (profanedb.protobuf.options).key = true ];
// Once stored, the object in the database will have a reference to the nested object
KeyInt nested_keyable = 2;
}
// This message can also be stored and retrieved
message KeyInt {
int32 int_key = 1 [ (profanedb.protobuf.options).key = true ];
}
This code is a basic example of a ProfaneDB schema.
What makes it useful for ProfaneDB is just the key annotation,
which is what is called a custom option in Protobuf.
Schema definition
Protobuf options
To annotate our key, we decided to use the option feature of Protobuf.
Options can be used in proto files, messages, enums and services.
File options
Files options are seen all the time, even in Protobuf sourcecode
option csharp_namespace = "Google.Protobuf.WellKnownTypes";
option go_package = "github.com/golang/protobuf/ptypes/any";
option java_package = "com.google.protobuf";
option java_outer_classname = "AnyProto";
option java_multiple_files = true;
option objc_class_prefix = "GPB";
These options for instance are read at compile time, when calling protoc
, by their respective plugin, and used to define the classname in the given language.
Message options
Messages options are of 2 types:
- Message options can be set for the whole message. See the sample message here.
- Message field options are individual to single fields, and this is the case of
(profanedb.protobuf.options).key
. The syntax for this is defined here.
Enum and service options
Enums and services are very similar to messages. Protobuf documentation shows both.
Definining custom options
Custom options require to understand another concept of Protobuf: extensions.
Again, it would be pointless for me to emulate what the official documentation already explains very clearly.
All we need to note here is how extensions are applied, and how they should be used to nest our custom options.
src/profanedb/protobuf/options.proto
syntax = "proto2";
import "google/protobuf/descriptor.proto";
package profanedb.protobuf;
// These options should be used during schema definition,
// applying them to some of the fields in protobuf
message FieldOptions {
optional bool key = 1;
}
extend google.protobuf.FieldOptions {
optional FieldOptions options = 1036;
}
Let’s give a look at how it’s done in ProfaneDB.
syntax = "proto2"
we need this because extensions make use of bits and pieces removed from Protobuf 3. Protobuf 3 retains backward compatibility, however, this directive is what the compiler needs to process further instructions such asoptional [...]
import "google/protobuf/descriptor.proto"
here are defined the extendable messages. We can actually give a look at the code to see them at work. In there are also defined default options.package profanedb.protobuf
this is very important to avoid clashing with different options. See how it is part of the name of this option[ (profanedb.protobuf.options).key = true ];
message FieldOptions { ... }
this could be any name. It is local to this proto file.
It is used to nest the actual options:optional bool key = 1;
this could be one of many, see how they are marked optional.
Also note thatkey
is the id used in[ (profanedb.protobuf.options).key = true ];
extend google.protobuf.FieldOptions { ... }
we could be extending FileOptions, MessageOptions, EnumOptions, ServiceOptions … in the same way.optional FieldOptions options = 1036
here we are “injecting” our custom options messageFieldOptions
into the originalgoogle.protobuf.FieldOptions
Note how the name will be used later on:[ (profanedb.protobuf.options).key = true ]
:
profanedb.protobuf
we have seen coming from our package;options
is defined here, andkey
came from ourFieldOptions
message.- Now
1036
is ProfaneDB extension number. Because we hope it will be useful to other people too, we needed it to have a unique extension number, in case other Protobuf plugins were in use. The range50000-99999
can be used during development, however, should you like to release your project, you’ll have to notify Google, so that a unique extension number is assigned to you. As we’ve seen above, you don’t need more than one extension number, as a single extension can be of type Message, thus nesting other fields in it.
Using custom options
Now we’ve seen how to create our custom options, we have to decide how to make them useful. First of all, speaking for instance of message field options, it should be clear how they are different from message fields themselves.
Message fields are defined in our message ... { ... }
declaration in a .proto
file. They give a structure to your data, and make sense once you actually fill them with your content.
Message field options are part of a message
declaration, they add some metadata and context to a message field declaration.
Speaking in Protobuf terms, creating a message
inside a .proto
file generates a Descriptor. Any field declared inside it will be a FieldDescriptor.
FileDescriptor, Descriptor, FieldDescriptor will all provide a options()
method, that returns respectively FileOptions, MessageOptions, FieldOptions and so on for all kinds of custom options.
For instance, see how ProfaneDB finds out whether a Descriptor has a key defined:
// Check whether a Descriptor has a field with key option set
bool Loader::IsKeyable(const google::protobuf::Descriptor * descriptor) const
{
for (int i = 0; i < descriptor->field_count(); i++) {
// If any field in message has profanedb::protobuf::options::key set
if (descriptor->field(i)->options().GetExtension(profanedb::protobuf::options).key())
return true;
}
return false;
}
Most if not all Protobuf libraries in any language should provide public interfaces to interact with Descriptors and retrieve custom options.
In the next article we are going to discuss how ProfaneDB retrieves, interacts with and manipulates Descriptors.
We will examine other advanced features such as DescriptorPool,
dynamic messages
and reflection.