Protocol Buffers in Python

Own flexible implementation of Protocol Buffers in pure Python


  • Compatible with Google's one.
  • Supports required, packed and packed repeated fields.
  • Supports embedded messages.
  • Supports streaming of several messages.
  • Provides self-describing messages.
  • Easily extensible.


Hereinafter assume that everything is imported into the global namespace:

from encoding import *

Assume you have the following definition:

message Test2 {
  string b = 2;

First, you should create the message type:

Test2 = MessageType()
Test2.add_field(2, 'b', String)

Then, create a message and fill it with the appropriate data:

msg = Test2()
msg.b = 'Hello, world'

You can serialize it easily:

print msg.dumps()                     # This will dump into a string.
msg.dump(open('/tmp/message', 'wb'))  # Or any write-like object.

You also can deserialize this message with:

msg = Test2.load(open('/tmp/message', 'rb'))

Required field

To add a missing field you should pass an additional flags parameter to add_field.

If you'll not fill in a required field, then ValueError will be raised during serialization.

Test2 = MessageType()
Test2.add_field(2, 'b', String, flags=Flags.REQUIRED)

Repeated field

An input value of repeated field can be any iterable object. The loaded value will always be list.

Test2 = MessageType()
Test2.add_field(1, 'b', UVarint, flags=Flags.REPEATED)
msg = Test2()
msg.b = (1, 2, 3)

Packed repeated field

Test4 = MessageType()
Test4.add_field(4, 'd', UVarint, flags=Flags.PACKED_REPEATED)
msg = Test4()
msg.d = (3, 270, 86942)

Embedded messages

Consider the following definitions:

message Test1 {
  int32 a = 1;

message Test3 {
  required Test1 c = 3;

To create an embedded field, pass EmbeddedMessage as the type of field and fill it like this:

# Create the type.
Test1 = MessageType()
Test1.add_field(1, 'a', UVarint)
Test3 = MessageType()
Test3.add_field(3, 'c', EmbeddedMessage(Test1))

# Fill in the message.
msg = Test3()
msg.c = Test1()
msg.c.a = 150

Supported Data Types

There are the following data types supported for now:

UVarint             # Unsigned integer.
Varint              # Signed integer.
Bool                # Boolean.
Fixed64             # 8-byte string.
UInt64              # C++'s 64-bit `unsigned long long`
Int64               # C++'s 64-bit `long long`
Float64             # C++'s `double`.
Fixed32             # 4-byte string.
UInt32              # C++'s 32-bit `unsigned int`.
Int32               # C++'s 32-bit `int`.
Float32             # C++'s `float`.
Bytes               # Pure bytes string.
Unicode             # Unicode string.
TypeMetadata        # Type that describes another type.

Streaming messages

The Protocol Buffer format is not self delimiting. But you can wrap you message type in EmbeddedMessage class and write/read it sequentially.

The other option is to use protobuf.EofWrapper that has a limit parameter in its constructor. The EofWrapper raises EOFError when the specified number of bytes is read.

Self-describing messages and TypeMetadata

There is no any description of the message type in a message itself. Therefore, if you want to send a self-described messages, you should send the a description of the message too.

A, B, C = MessageType(), MessageType(), MessageType()
A.add_field(1, 'a', UVarint)
A.add_field(2, 'b', TypeMetadata, flags=Flags.REPEATED)
A.add_field(3, 'c', Bytes)
B.add_field(4, 'ololo', Float32)
B.add_field(5, 'c', TypeMetadata, flags=Flags.REPEATED)
B.add_field(6, 'd', Bool, flags=Flags.PACKED_REPEATED)
C.add_field(7, 'ghjhdf', UVarint)
msg = A()
msg.a = 1
msg.b = [B, C]
msg.c = 'ololo'
bytes = msg.dumps()
msg = A.loads(bytes)
msg2 = msg.b[0]()

add_field chaining

add_field return the message type itself, thus you can do in this way:

    1, 'a', EmbeddedMessage(MessageType().add_field(1, 'a', UVarint)))

More info

See protobuf to see the API and run-tests modules to see more usage examples. The code should be well-documented.