Google’s Protocol Buffers is a structured way to serialize data.
Think JSON, on steroids.

We are going to discuss advanced Protobuf usage.

Table of Contents

Extending Protobuf: custom options

Why Protobuf

Before reading this article, you should get comfortable with the ideas behind Protocol Buffers and how they are implemented in your language of choice.
We are going to focus on C++, since this is one of the most stable interfaces and provides access to all of Protobuf features.

ProfaneDB, our sample project

The code shown here can be seen in action in my project, ProfaneDB, written in C++ (overview of Protobuf C++ API).
The purpose of ProfaneDB is to store Protobuf messages in a key-value database (namely RocksDB), avoiding duplication, and providing an easy interface to retrieve objects.

This is done in two steps:

  1. Define a “schema”. When it comes to a KV database, what we really need is just a key.
  2. Save nested messages and store a reference to their key.
// This message can be stored and retrieved:
// it has a key which will identify it uniquely
message ParentMessage {
  string unique_key = 1 [ (profanedb.protobuf.options).key = true ];

  // Once stored, the object in the database will have a reference to the nested object
  KeyInt nested_keyable = 2;
}

// This message can also be stored and retrieved
message KeyInt {
  int32 int_key = 1 [ (profanedb.protobuf.options).key = true ];
}

This code is a basic example of a ProfaneDB schema.
What makes it useful for ProfaneDB is just the key annotation, which is what is called a custom option in Protobuf.

Schema definition

Protobuf options

To annotate our key, we decided to use the option feature of Protobuf.

Options can be used in proto files, messages, enums and services.

File options

Files options are seen all the time, even in Protobuf sourcecode

option csharp_namespace = "Google.Protobuf.WellKnownTypes";
option go_package = "github.com/golang/protobuf/ptypes/any";
option java_package = "com.google.protobuf";
option java_outer_classname = "AnyProto";
option java_multiple_files = true;
option objc_class_prefix = "GPB";

These options for instance are read at compile time, when calling protoc, by their respective plugin, and used to define the classname in the given language.

Message options

Messages options are of 2 types:

  1. Message options can be set for the whole message. See the sample message here.
  2. Message field options are individual to single fields, and this is the case of (profanedb.protobuf.options).key. The syntax for this is defined here.

Enum and service options

Enums and services are very similar to messages. Protobuf documentation shows both.

Definining custom options

Custom options require to understand another concept of Protobuf: extensions.
Again, it would be pointless for me to emulate what the official documentation already explains very clearly.

All we need to note here is how extensions are applied, and how they should be used to nest our custom options.

src/profanedb/protobuf/options.proto

syntax = "proto2";

import "google/protobuf/descriptor.proto";

package profanedb.protobuf;

// These options should be used during schema definition,
// applying them to some of the fields in protobuf
message FieldOptions {
  optional bool key = 1;
}

extend google.protobuf.FieldOptions {
  optional FieldOptions options = 1036;
}

Let’s give a look at how it’s done in ProfaneDB.

  • syntax = "proto2"
    we need this because extensions make use of bits and pieces removed from Protobuf 3. Protobuf 3 retains backward compatibility, however, this directive is what the compiler needs to process further instructions such as optional [...]

  • import "google/protobuf/descriptor.proto"
    here are defined the extendable messages. We can actually give a look at the code to see them at work. In there are also defined default options.

  • package profanedb.protobuf
    this is very important to avoid clashing with different options. See how it is part of the name of this option [ (profanedb.protobuf.options).key = true ];

  • message FieldOptions { ... }
    this could be any name. It is local to this proto file.
    It is used to nest the actual options:

    • optional bool key = 1;
      this could be one of many, see how they are marked optional.
      Also note that key is the id used in [ (profanedb.protobuf.options).key = true ];
  • extend google.protobuf.FieldOptions { ... }
    we could be extending FileOptions, MessageOptions, EnumOptions, ServiceOptions … in the same way.

    • optional FieldOptions options = 1036
      here we are “injecting” our custom options message FieldOptions into the original google.protobuf.FieldOptions
      Note how the name will be used later on: [ (profanedb.protobuf.options).key = true ] :
      • profanedb.protobuf we have seen coming from our package;
      • options is defined here, and key came from our FieldOptions message.
      • Now 1036 is ProfaneDB extension number. Because we hope it will be useful to other people too, we needed it to have a unique extension number, in case other Protobuf plugins were in use. The range 50000-99999 can be used during development, however, should you like to release your project, you’ll have to notify Google, so that a unique extension number is assigned to you. As we’ve seen above, you don’t need more than one extension number, as a single extension can be of type Message, thus nesting other fields in it.

Using custom options

Now we’ve seen how to create our custom options, we have to decide how to make them useful. First of all, speaking for instance of message field options, it should be clear how they are different from message fields themselves.

Message fields are defined in our message ... { ... } declaration in a .proto file. They give a structure to your data, and make sense once you actually fill them with your content.
Message field options are part of a message declaration, they add some metadata and context to a message field declaration.

Speaking in Protobuf terms, creating a message inside a .proto file generates a Descriptor. Any field declared inside it will be a FieldDescriptor.
FileDescriptor, Descriptor, FieldDescriptor will all provide a options() method, that returns respectively FileOptions, MessageOptions, FieldOptions and so on for all kinds of custom options.

For instance, see how ProfaneDB finds out whether a Descriptor has a key defined:

// Check whether a Descriptor has a field with key option set
bool Loader::IsKeyable(const google::protobuf::Descriptor * descriptor) const
{
    for (int i = 0; i < descriptor->field_count(); i++) {
        // If any field in message has profanedb::protobuf::options::key set
        if (descriptor->field(i)->options().GetExtension(profanedb::protobuf::options).key())
            return true;
    }
    return false;
}

Most if not all Protobuf libraries in any language should provide public interfaces to interact with Descriptors and retrieve custom options.

In the next article we are going to discuss how ProfaneDB retrieves, interacts with and manipulates Descriptors.
We will examine other advanced features such as DescriptorPool, dynamic messages and reflection.