Sharding is a horizontal partition of data in a database
Sharding needs to be based on something like userId
We can shard a shard if shard grows big, aka Hierarchical Sharding
We can perform indexing on each shard based on different column to make query faster
We use master-slave architecture for shards. Writes go to master, reads are distributed to all the slaves. If master fails, one of the slave becomes a master
Uses:
Improves Scalability
Notes:
NoSQL already implements this concept from behind, we don’t need to worry about it
Sharding should be considered if really necessary, otherwise improving query and indexing is good optimization methods
Topics:
How do you perform JOINS on different shards?
Having dynamic number of shards is challenging, we can use hierarchical sharding to mitigate this limitation
It can be done by consistent hashing?
how about transactions??
Types of database partitioning
Horizontal Partitioning: Partitioning based on rows
Vertical Partitioning: Partitioning based on columns
Partition and Replication
Partition is generally combined with Replication
A node may store more than one partition
An example could be where each partition’s leader is assigned to one node
partition’s followers are assigned other nodes
Partitioning Strategies
If partitioning is unfair, partitions become skewed
When a partition has too much data/load it becomes hotspot
DBs cannot automatically fix skewed workload
Responsibility of the application to reduce the skewed load
Partitioning by random number
Not useful since we can’t figure out which key maps to partition during read
Partitioning by Key Range
Similar to Volumes of Encyclopedia
Good for range queries
Partition boundary is chosen manually or automatically
Certain access patterns can cause hotspots
Used by
Google Bigtable
Apache HBase (Bigtable Open Source)
RethinkDB
MongoDB
Partitioning by Hash of Key
Good Hash Function makes skewed data uniformly distributed
Each partition = range of hashes
Should not use hash mod N
Since if N changes most of keys will be redistributed