Database Case Studies

LSM-Tree and SSTables

  • Writes go to memtable in-memory (could be balanced BST or SkipList etc.)
  • Flushed to disk when it gets too big
  • Merging and Compaction happens in the background
  • Pros
    • Faster writes to Memory
  • Cons
    • Reads maybe slow because need to search for many SSTables for value of a key

B-Tree

  • Balanced BST keeping pointers
  • Writes go to pages (existing or create/split a new page)
  • Pros
    • Faster Reads, know exactly where the key is located
  • Cons
    • Slower writes to Disk

Replication

  • Single leader
    • Pros: No Data Conflicts
    • Cons: All writes go to one node
  • Multi-Leader
    • Pros: Increase Write throughput across multiple nodes
    • Cons: Data Conflicts

MongoDB

  • Document DB
  • Better Data Locality
  • Uses B-Trees
  • Supports distributed Tx supported
  • Useful for:
    • Nothing special since you can also use Traditional SQL DB
    • Might be useful if you want ACID in data which is more flexible

Apache Cassandra

  • Wide Column DB, Row oriented
  • Inspired by Dynamo+Bigtable
  • Has a shard key and sort key
  • Flexible Schema, Ease of Partitioning
  • Uses LSM Tree and SSTable
    • Super Fast writes
  • Configurable Replication
    • Multi-Leader
    • Leaderless
  • Causes write conflicts
  • Only supports LWW conflict resolution
  • Has very efficient CQL, subset of SQL with No Joins
  • Useful for:
    • High Write Volume (Millions of Writes per second)
    • Consistency is not important, it is Tunable also
    • All reads and writes go to same shard (No Tx)
    • Efficient Range Queries
  • Examples:
    • Chat application with sharding key as chatID, sort key as timestamp
    • Time Series and Analytics Data

Apache Cassandra

  • https://stackoverflow.com/questions/75212982/how-is-cassandra-column-oriented-and-at-the-same-time-suitable-for-oltp
  • https://www.youtube.com/watch?v=6bxin9cZL_w
  • Cassandra is NOT column oriented Database
  • Cassandra is wide column or column family DB
  • Apple has a Cassandra cluster with over 75k nodes and 10 petabytes of data
  • Cassandra is very efficient in write queries
  • Cassandra Query Language (CQL) is not as good as SQL
  • Partitioning
    • using cluster key
    • Hash Ring
    • uses gossip protocol
    • All reads and writes should go to one partition
    • very little support for distributed transactions
  • Replication
    • Leaderless
    • Read Repair
    • Anti entropy
    • Quorums, configurable, hence consistency configurable
  • Write conflicts
    • Last Write Wins (LWW)
  • Riak also alternative to Cassandra
    • Riak supports CRDT (Conflict free Replicated Datatypes)
  • Cassandra Single Node
    • LSM Tree + SSTable — write optimized
    • only row level locking in transactions
  • Very poor data guarantees
    • uses LWW
  • Uses
    • Chat Application, cluster key = chatId

Riak

  • Key Value DB but not in-memory
  • Inspired by Dynamo
  • Uses LSM Tree and SSTable
    • Super Fast writes
  • Configurable Replication
    • Multi-Leader
    • Leaderless
  • Causes write conflicts
  • Supports CRDTs and captures causal relationship
  • Has version vectors
  • Useful for
    • Fast Writes
    • Simple Queries, Key-lookups
    • More than LWW conflict resolution
  • Examples
    • Session Storage
    • Tokens

Apache HBase

  • Wide Column DB, Column Oriented
  • Uses LSM Tree and SSTable
    • Faster Writes
  • Built on top of HDFS
  • Strong Consistency
    • Single Leader Replication
    • No Write Conflicts
  • Column Compression and better data locality for the column data
  • Useful For
    • Strong Consistency (CP)
  • Examples
    • Analytics for Hadoop Data

Redis

  • Key Value Store, In-memory
  • Useful For
    • Very Fast Reads/Writes
    • Simpler Queries
    • Less Durable
  • Examples
    • Caching
    • GeoSpatial Index in Uber

Google Spanner

  • TrueTime API
    • generally systems don’t expose uncertainty in clock values
    • TrueTime API reports confidence interval in local clock
    • returns [earliest, latest]
    • earliest and latest possible timestamp
      • calculation includes how long the clock was synced with source among other things
    • Snapshot Isolation is implemented using TrueTime API
      • wait for the length of the confidence interval before committing transaction
  • Deploys a GPS receiver or atomic clock in each datacenter

Amazon DynamoDB