Exercise: Protobuf Parsing

In this exercise, you will build a parser for the protobuf binary encoding. Don’t worry, it’s simpler than it seems! This illustrates a common parsing pattern, passing slices of data. The underlying data itself is never copied.

Fully parsing a protobuf message requires knowing the types of the fields, indexed by their field numbers. That is typically provided in a proto file. In this exercise, we’ll encode that information into match statements in functions that get called for each field.

We’ll use the following proto:

message PhoneNumber {
  optional string number = 1;
  optional string type = 2;

message Person {
  optional string name = 1;
  optional int32 id = 2;
  repeated PhoneNumber phones = 3;

A proto message is encoded as a series of fields, one after the next. Each is implemented as a “tag” followed by the value. The tag contains a field number (e.g., 2 for the id field of a Person message) and a wire type defining how the payload should be determined from the byte stream.

Integers, including the tag, are represented with a variable-length encoding called VARINT. Luckily, parse_varint is defined for you below. The given code also defines callbacks to handle Person and PhoneNumber fields, and to parse a message into a series of calls to those callbacks.

What remains for you is to implement the parse_field function.

use std::convert::TryFrom;
use thiserror::Error;

#[derive(Debug, Error)]
enum Error {
    #[error("Invalid varint")]
    #[error("Invalid wire-type")]
    #[error("Unexpected EOF")]
    #[error("Invalid length")]
    InvalidSize(#[from] std::num::TryFromIntError),
    #[error("Unexpected wire-type)")]
    #[error("Invalid string (not UTF-8)")]

/// A wire type as seen on the wire.
enum WireType {
    /// The Varint WireType indicates the value is a single VARINT.
    //I64,  -- not needed for this exercise
    /// The Len WireType indicates that the value is a length represented as a VARINT
    /// followed by exactly that number of bytes.
    /// The I32 WireType indicates that the value is precisely 4 bytes in little-endian order
    /// containing a 32-bit signed integer.

/// A field's value, typed based on the wire type.
enum FieldValue<'a> {
    //I64(i64),  -- not needed for this exercise
    Len(&'a [u8]),

/// A field, containing the field number and its value.
struct Field<'a> {
    field_num: u64,
    value: FieldValue<'a>,

impl TryFrom<u64> for WireType {
    type Error = Error;

    fn try_from(value: u64) -> Result<WireType, Error> {
        Ok(match value {
            0 => WireType::Varint,
            //1 => WireType::I64,  -- not needed for this exercise
            2 => WireType::Len,
            5 => WireType::I32,
            _ => return Err(Error::InvalidWireType),

impl<'a> FieldValue<'a> {
    fn as_string(&self) -> Result<&'a str, Error> {
        let FieldValue::Len(data) = self else {
            return Err(Error::UnexpectedWireType);
        Ok(std::str::from_utf8(data).map_err(|_| Error::InvalidString)?)

    fn as_bytes(&self) -> Result<&'a [u8], Error> {
        let FieldValue::Len(data) = self else {
            return Err(Error::UnexpectedWireType);

    fn as_u64(&self) -> Result<u64, Error> {
        let FieldValue::Varint(value) = self else {
            return Err(Error::UnexpectedWireType);

/// Parse a VARINT, returning the parsed value and the remaining bytes.
fn parse_varint(data: &[u8]) -> Result<(u64, &[u8]), Error> {
    for i in 0..7 {
        let Some(b) = data.get(i) else {
            return Err(Error::InvalidVarint);
        if b & 0x80 == 0 {
            // This is the last byte of the VARINT, so convert it to
            // a u64 and return it.
            let mut value = 0u64;
            for b in data[..=i].iter().rev() {
                value = (value << 7) | (b & 0x7f) as u64;
            return Ok((value, &data[i + 1..]));

    // More than 7 bytes is invalid.

/// Convert a tag into a field number and a WireType.
fn unpack_tag(tag: u64) -> Result<(u64, WireType), Error> {
    let field_num = tag >> 3;
    let wire_type = WireType::try_from(tag & 0x7)?;
    Ok((field_num, wire_type))

/// Parse a field, returning the remaining bytes
fn parse_field(data: &[u8]) -> Result<(Field, &[u8]), Error> {
    // 1. Read and unpack the tag.
    // 2. Based on the wire type, build a Field, consuming as many bytes as
    //    necessary.
    // 3. Return the field, and any un-consumed bytes.

/// Parse a message in the given data, calling `field_callback` for each field in the message.
/// The entire input is consumed.
fn parse_message(
    mut data: &[u8],
    field_callback: impl Fn(Field) -> Result<(), Error>,
) -> Result<(), Error> {
    while !data.is_empty() {
        let parsed = parse_field(data)?;
        data = parsed.1;

fn main() {
    /// Handle a field in a Person message.
    fn person_field(field: Field) -> Result<(), Error> {
        match field.field_num {
            1 => println!("name: {}", field.value.as_string()?),
            2 => println!("id: {}", field.value.as_u64()?),
            3 => {
                parse_message(field.value.as_bytes()?, phone_number_field)?;
            _ => {} // skip everything else

    /// Handle a field in a PhoneNumber message.
    fn phone_number_field(field: Field) -> Result<(), Error> {
        match field.field_num {
            1 => println!("  number: {}", field.value.as_string()?),
            2 => println!("  type: {}", field.value.as_string()?),
            _ => {} // skip everything else

            0x0a, 0x07, 0x6d, 0x61, 0x78, 0x77, 0x65, 0x6c, 0x6c, 0x10, 0x2a, 0x1a, 0x16,
            0x0a, 0x0e, 0x2b, 0x31, 0x32, 0x30, 0x32, 0x2d, 0x35, 0x35, 0x35, 0x2d, 0x31,
            0x32, 0x31, 0x32, 0x12, 0x04, 0x68, 0x6f, 0x6d, 0x65, 0x1a, 0x18, 0x0a, 0x0e,
            0x2b, 0x31, 0x38, 0x30, 0x30, 0x2d, 0x38, 0x36, 0x37, 0x2d, 0x35, 0x33, 0x30,
            0x38, 0x12, 0x06, 0x6d, 0x6f, 0x62, 0x69, 0x6c, 0x65,