Relational Database Architecture

When we talk about key-value, value could be anything like actual row like document or vertex
The place where rows are stored is known as heap file

Storage Engines

Log is append only sequence of immutable records
Suitable where value is updated frequently
Example: Video URL ⇒ number of times it was played
Example Engine:
- Bitcask Storage Engine of Riak
- Indexing
  - whole keys are kept in memory
  - values can be on disk (one disk seek to access value)

We break the log into segments of certain size
To efficiently use disk space, we perform following in the background thread
- compaction: throwing away duplicate keys in the log
- merging: after compaction, the segments become small and are merged in new segment
Each segment has its own in-memory hash index
Deletion
- adds a special deletion record to log called tombstone
- During merging, the record is discarded
Benefits
- Efficient Disk space utilization
Limitations
- all keys must fit in memory
- range queries are inefficient

Log-structured Merge Tree
Originally described by Patrick O’Neil, the same idea is used to construct in-memory balanced tree
The in-memory balanced tree is also called memtable
Google Bigtable Paper introduced the terms SSTable and memtable
When the memtable grows more than threshold (few MBs) it is written to disk as SSTable file/segment in sorted order
- meanwhile writes will go to new memtable while this happens
- separate log (unsorted) is also maintained for backup purpose in case of crash
Reads will first search in memtable then recent segment then next-older etc.
In background merging and compaction happens
Examples:
- LevelDB Storage Engine in Riak
- Cassandra engine
- HBase engine
- Lucene Indexing Engine of Solr and Elasticsearch
Limitations
- Slow when the key does not exist since need to look through all segments
  - Bloom Filters are used to solve this
- Compaction/Merging process can interfere with performance
  - can cause multiple writes, known as write amplification
- Difficult to implement locks since the multiple copies of same key is stored in multiple segments
Benefits
- Efficient range queries
- High throughput since writes are sequential
https://stackoverflow.com/questions/58168809/what-is-the-differences-between-the-term-sstable-and-lsm-tree
https://www.scylladb.com/glossary/log-structured-merge-tree/

B-Trees and B+ Trees
B-Trees break down DB into fixed size blocks or Pages
A Page is traditionally 4KB in size, close to underlying hardware disk
A Page can be read or written at a time
Each Page can be identified by address
One page is root of B-Tree
Each Page has references to other pages based on range of keys
Leaf Page has
- values stored inline to keys or
- reference to the pages where values are stored
Add: Split the page and update the parent
A 4-level tree of 4KB pages with branching factor of 500 can store up to 250 TB
Crash Recovery
- Use Write Ahead Log
Concurrency:
- use latches (DB terminology) — Lightweight Locks
Disk Usage
- Use abbreviated keys
- variant known as B+ Tree
- Increases branching factor, hence fewer levels
Examples
- All Traditional Relational DB supports B/B+ Trees

aka redo log
For Crash Recovery
Append only file to which every B-Tree modification must be written before it can be applied to the pages of the tree

Disk Usage
- LSM Trees can compress better and hence low disk space
- B-Tree can have fragmentation due to splitting of Pages
Performance
- LSM Trees compaction process can interfere with ongoing read/writes
- LSM Trees have faster writes because of sequential writes on disk
- B-Trees are thought to have better reads
Transactions
- B-Trees can have locks attached to the tree itself and possible to implement different isolation levels
- Difficult to implement locks on LSM-Trees

Clustered Index
- Store all row data within the index
Non-Clustered Index
- Store only references to the data within the index
Geospatial Index
- R-Trees are used
- Example: PostgreSQL generalized search tree indexing
Fuzzy Index
- Include Synonyms
- Edit-distance is used
- Finite automation similar to Tries used
- Example: Lucene Search Index