Google’s Protocol Buffers is a structured way to serialize data.
Think JSON, on steroids.

We are going to discuss a testing method, to fill Protobuf messages with random data, using reflection and Boost.Random.

Table of Contents

Extending Protobuf: reflection

Protobuf, C++ and Boost

This article tackles advanced Protobuf topics, so one should be comfortable with the basics before reading this.
We’re going to refer to the C++ library, and implement a simple class to fill Protobuf messages with random data. Boost.Random is also used for this purpose.

The code in action

The techniques mentioned here are used within my project ProfaneDB for testing purposes. profanedb::util::RandomGenerator can be checked for reference.

It can also be seen in action in these unit tests.

Reflection

The idea behind reflection (without referring to Protobuf in particular), is for code to be able to interact in multiple ways with some other code, of which it has no knowledge at compile time.

Talking about Protobuf, this means reading and writing messages which were not compiled using protoc.

Generation of values

We are first going to look at how values are generated using Boost.Random.

For our purposes, we’ll need to be able to generate random values for the following types:

C++ type Protobuf types FieldDescriptor::CPPTYPE
google::protobuf::int32 int32, sint32, sfixed32 CPPTYPE_Int32
google::protobuf::int64 int64, sint64, sfixed64 CPPTYPE_Int64
google::protobuf::uint32 uint32, fixed32 CPPTYPE_UInt32
google::protobuf::uint64 uint64, fixed64 CPPTYPE_UInt64
google::protobuf::string string CPPTYPE_String
double double CPPTYPE_Double
float float CPPTYPE_Float
string string, bytes CPPTYPE_String

For integer values, we use boost::random::uniform_int_distribution.
TYPE is replaced with each C++ type we are going to implement for the first case: - google::protobuf::int32 - google::protobuf::int64 - google::protobuf::uint32 - google::protobuf::uint64

boost::random::mt19937 generator;

template<>
TYPE RandomValue< TYPE >() {
  boost::random::uniform_int_distribution< TYPE > range(
    std::numeric_limits< TYPE >::min(),
    std::numeric_limits< TYPE >::max()
  );

  return range(generator);
}

Let’s see what happens here:
we are defining a template function, this way, RandomValue can simply be called with the correct type to get a valid value.

uniform_int_distribution takes a template parameter to know what result will be returned for its operator() call (using the mt19937 Mersenne Twister generator as source of randomness).

Its constructor requires two parameters, the minimum value and maximum value to return.
Here we simply make use of std::numeric_limits which does just this for scalars.

This code can be seen here with macros to substitute TYPE with all the required values at compile time.

The same procedure is repeated for double and float, using boost::random::uniform_real_distribution.

Then for string, generating a string appending a x (generated randomly as an unsigned integer) number of random characters (drawn from a list of characters).

And eventually for bool, using only 0 and 1 as numbers for uniform_int_distribution.

Filling the messages

Now a single message can be filled with random values. Nested messages must also be filled recursively.

Here is where reflection is needed.

First, our message Descriptor is used to retrieve the list of fields.

Descriptor * descriptor = message->GetDescriptor();

for (int i = 0; i < descriptor->field_count(); i++)
  FieldDescriptor * fd = descriptor->field(i);

Then, for each field a random value is generated according to its C++ type, which can be retrieved using FieldDescriptor::cpp_type(). For instance:

Reflection * reflection = message->GetReflection();

switch(fd->cpp_type()) {
  case FieldDescriptor::CPPTYPE_Int32:
    reflection->SetInt32(message, fd, RandomValue<google::protobuf::int32>());
    break;
}

This is repeated for each C++ type, and also for repeated fields, where methods such as Reflection::AddInt32 and Reflection::AddString are used.

All of this can be seen here.

If the given field is a nested message, the method is simply called recursively with a pointer to the mutable message, hence filling the whole message tree.

Reflection * reflection = message->GetReflection();

switch(fd->cpp_type()) {
  case FieldDescriptor::CPPTYPE_MESSAGE:
    this->FillRandomly(reflection->MutableMessage(message, fd));
    break;
}

Again, if the field is repeated, Reflection::AddMessage is used.