Encoding

  • Translation from the in-memory representation to a byte sequence is called encoding and the reverse is called decoding
  • Encoding aka Serialization/Marshalling
  • Decoding aka Deserialization/Unmarshalling
  • Types of encoding
    • Language built in
    • Text based
    • Binary

Language Built-in

  • Examples
    • Java: java.io.Serializable
    • Ruby: Marshal
    • Python: pickle
  • Cannot do cross-language data transfer
  • Versioning and hence backward/forward compatibility is not good
  • Efficiency (memory + processing) not good
  • Hence not advisable to use language specific format

Text Based

  • Examples
    • JSON
    • XML
    • CSV
  • In XML, CSV you cannot distinguish string from numbers
  • JSON numbers don’t distinguish between integers and floating points
  • JSON numbers don’t specify precision
  • JSON, XML have good support for UTF strings, but don’t support binary strings
  • JSON, XML have optional support of schema, but CSV doesn’t support at all

Binary

  • Examples
    • Schema Based
      • Thrift — by Facebook
      • Protocol Buffer — by Google
      • Avro — by Hadoop
    • JSON Based: MessagePack, BSON, BJSON, UBJSON, BISON, Smile
    • XML Based: WBXML, Fast Infoset
  • MessagePack contains field names in encoding, hence larger payload
  • Thrift, Protocol Buffers have schema and code generation and do not contain literal field names in the encoding, instead field tags are used
  • Avro uses Avro IDL (Interface Definition Language) and JSON Schema, and do not contain field tags in the encoding
  • Avro is more compact than Thrift and Protocol Buffers

Benefits of Schema based Binary encoding

  • Thrift, Protocol Buffers and Avro are much more compact than JSON/XML Based Binary variants
  • The schema is valuable form of documentation
  • Schema helps to check backward/forward compatibility
  • Schema enable code generation in statically typed languages providing type checking at compile time

Evolution

Compatibility

  • It is relationship between one process that encodes the data and another process that decodes it
  • Backward Compatibility
    • New code can read older data
    • New reader schema, Old writer schema
  • Forward Compatibility
    • Old code can read newer data
    • Old reader schema, New writer schema
  • Many services need to support rolling upgrades where a new version of a service is gradually deployed to few nodes at a time, rather than deploying to all nodes simultaneously
  • Forward and Backward Compatibility is important in such scenario

Versioning

  • REST API generally uses version number in the URL or in the HTTP Accept header
  • API keys can also identify the client’s requested version