Database Case Studies
LSM-Tree and SSTables
- Writes go to memtable in-memory (could be balanced BST or SkipList etc.)
- Flushed to disk when it gets too big
- Merging and Compaction happens in the background
- Pros
- Cons
- Reads maybe slow because need to search for many SSTables for value of a key
B-Tree
- Balanced BST keeping pointers
- Writes go to pages (existing or create/split a new page)
- Pros
- Faster Reads, know exactly where the key is located
- Cons
Replication
- Single leader
- Pros: No Data Conflicts
- Cons: All writes go to one node
- Multi-Leader
- Pros: Increase Write throughput across multiple nodes
- Cons: Data Conflicts
MongoDB
- Document DB
- Better Data Locality
- Uses B-Trees
- Supports distributed Tx supported
- Useful for:
- Nothing special since you can also use Traditional SQL DB
- Might be useful if you want ACID in data which is more flexible
Apache Cassandra
- Wide Column DB, Row oriented
- Inspired by Dynamo+Bigtable
- Has a shard key and sort key
- Flexible Schema, Ease of Partitioning
- Uses LSM Tree and SSTable
- Configurable Replication
- Causes write conflicts
- Only supports LWW conflict resolution
- Has very efficient CQL, subset of SQL with No Joins
- Useful for:
- High Write Volume (Millions of Writes per second)
- Consistency is not important, it is Tunable also
- All reads and writes go to same shard (No Tx)
- Efficient Range Queries
- Examples:
- Chat application with sharding key as chatID, sort key as timestamp
- Time Series and Analytics Data
Apache Cassandra
- https://stackoverflow.com/questions/75212982/how-is-cassandra-column-oriented-and-at-the-same-time-suitable-for-oltp
- https://www.youtube.com/watch?v=6bxin9cZL_w
- Cassandra is NOT column oriented Database
- Cassandra is wide column or column family DB
- Apple has a Cassandra cluster with over 75k nodes and 10 petabytes of data
- Cassandra is very efficient in write queries
- Cassandra Query Language (CQL) is not as good as SQL
- Partitioning
- using cluster key
- Hash Ring
- uses gossip protocol
- All reads and writes should go to one partition
- very little support for distributed transactions
- Replication
- Leaderless
- Read Repair
- Anti entropy
- Quorums, configurable, hence consistency configurable
- Write conflicts
- Riak also alternative to Cassandra
- Riak supports CRDT (Conflict free Replicated Datatypes)
- Cassandra Single Node
- LSM Tree + SSTable — write optimized
- only row level locking in transactions
- Very poor data guarantees
- Uses
- Chat Application, cluster key = chatId
Riak
- Key Value DB but not in-memory
- Inspired by Dynamo
- Uses LSM Tree and SSTable
- Configurable Replication
- Causes write conflicts
- Supports CRDTs and captures causal relationship
- Has version vectors
- Useful for
- Fast Writes
- Simple Queries, Key-lookups
- More than LWW conflict resolution
- Examples
Apache HBase
- Wide Column DB, Column Oriented
- Uses LSM Tree and SSTable
- Built on top of HDFS
- Strong Consistency
- Single Leader Replication
- No Write Conflicts
- Column Compression and better data locality for the column data
- Useful For
- Examples
- Analytics for Hadoop Data
Redis
- Key Value Store, In-memory
- Useful For
- Very Fast Reads/Writes
- Simpler Queries
- Less Durable
- Examples
- Caching
- GeoSpatial Index in Uber
Google Spanner
- TrueTime API
- generally systems don’t expose uncertainty in clock values
- TrueTime API reports confidence interval in local clock
- returns
[earliest, latest]
- earliest and latest possible timestamp
- calculation includes how long the clock was synced with source among other things
- Snapshot Isolation is implemented using TrueTime API
- wait for the length of the confidence interval before committing transaction
- Deploys a GPS receiver or atomic clock in each datacenter
Amazon DynamoDB
- Key Value Store, also supports Document data model
- Scale: More than 1/2 million RPS over hundreds of Tables, So choose when you really need this scale
- Uses primary key and sort key which can each have only one attribute compared to Cassandra which can have multiple columns
- 2007: Dynamo paper comes out, Dynamo originally built for cart service and highly available writes and was self managed
- While Amazon DynamoDB is AWS managed cloud service
- It is “infinitely scalable”
- Unlike Dynamo, it uses Paxos over its replicas
- It is multi-tenant
- References