Transactions
- Group of several reads and writes to form a logical unit
- Whole transaction is executed as one operation
- Either the entire transaction succeeds (commit)
- Or fails (abort, rollback)
- DB objects include Rows, Documents, Records etc.
- Atomicity can be implemented using log for crash recovery
- Isolation can be implemented using locks
Serializability
- Each Tx can pretend it is the only one running
- The result is same as if they had run serially
Atomic Write Operations
- aka Light Weight Transactions
- Used for single object writes
- Examples:
read-modify-write- for example, increment value atomically
UPDATE counters set value = value + 1 where foo = 'key'- Implementation
- by exclusive lock on the DB object, aka cursor stability
- by forcing all atomic operations serially
compare-and-set- allow update to happen only if the value has not changed since you last read it
- If it is modified, then it is retried
- It is optimistic in nature
UPDATE wiki where content = 'new' WHERE id = 123 and content = 'old'- DB has different implementations, always verify if it’s safe
- Some Databases support it
- MongoDB supports local modification of JSON atomically
- Redis supports modifying priority queues atomically
Retrying
- Aborted Tx can be retried
- If Tx actually succeeded but network failed, then Retrying can cause duplicate data
- If error is due to overload, retrying will make it worse
- Limit number of retries
- Use exponential backoff
- Useful for transient errors
- deadlock, isolation violation, network interruption, failover
- Not useful for permanent errors
- constraint violation
- Need to consider if Tx has side-effect like sending an email
- 2PC can help in multiple systems commit or abort together
Race Conditions
Dirty Reads
- One Tx reads the writes of another ongoing Tx which is not committed
- If other Tx is ongoing, and first one see partially updated (“dirty”) data
- If other Tx rollbacks, then first one has read invalid (“dirty”) data
Dirty Writes
- One Tx overwrites data that another Tx has written but not committed
- If Both Tx update multiple objects concurrently:
- update listing (Second Tx then First Tx)
- update invoice (First Tx then Second Tx)
- Due to race condition, This causes inconsistent data on both tables
Read Skew
- Skew = inconsistent
- Client sees different parts of DB at different points in time
- Example of Non-Repeatable read
- In case of transferring money from one account to another
- Initially
- Account 1: ₹ 500
- Account 2: ₹ 500
- Sum = ₹ 1,000
- First Tx writes
- Account 1: balance = balance - 100
- Account 2: balance = balance + 100
- Meanwhile Second Tx reads:
- Account 1: ₹ 500 (read during First Tx, will see old value)
- Account 2: ₹ 600 (read after First Tx completed, will see new value)
- Sum = ₹ 900 (Wrong conclusion)
- Initially
- Hence reading DB at different time during Tx can cause reading overall inconsistent data
- https://stackoverflow.com/questions/73917534/read-skew-vs-non-repeatable-read-transaction
Lost Updates
- Read-Modify-Write cycle done by two Tx concurrently on same object
- Data is lost since one overwrites the other
- Both Tx see current value as 500
- Both calculate the update to increment to 600
- If Both set the value then the final will only reach 600 (not 700)
- Very common problem in real world
- Read-Modify-Write cycle examples
- Increment counter
- Add element to list in JSON Document
- Two users editing same wiki page and saving it
- Prevention
- Use Atomic Write operations
- Use Explicit Locking
BEGIN TRANSACTION; -- take a lock on all the rows returned by query SELECT * from figures WHERE name = 'robot' and game_id = 222 FOR UPDATE; -- update position of character in game UPDATE figures SET position = 'c4' WHERE id = 1234; COMMIT;
- Detection
- Tx Manager detects if lost update happened, and abort or retry it
- Following aborts Tx
- PostgreSQL ⇒ Repeatable Read
- Oracle ⇒ Serializable
- MSSQL ⇒ Snapshot Isolation
- MySQL (InnoDB) does not detect lost updates
- Distributed Context
- If atomic operations are commutative then it is easy to implement
- Last Write Wins (LWW) conflict resolution in Replication causes lost updates
Write Skew
- Two Tx reads same object and writes on different object
- Both Tx makes decision based on the value read, but at the time of commit, the premise is no longer true
- Generalization of lost updates
- Compared to Lost updates, Write skew updates different object
- Examples:
- Doctor scheduling system
- Doctors
count(on_call) = 2, both request leave - Both Tx wants to update on_call = false for both doctors
- Both checks if
count(on_call) >= 1which is true - Both update the on_call = false, causing
count(on_call) = 0
- Doctors
- Meeting Room Booking: can cause conflicting bookings
- Premise: Check if meeting room empty
- Cannot use locks
- Multiplayer Game: moving two characters to the same spot
- Claiming same usernames by two users
- Doctor scheduling system
Phantom Reads
- Phantom = ghost
- Write in one Tx changes result of query in another Tx
- You read a set of rows, then later new rows appear that match your condition.
- Types:
- Straightforward Phantom reads
- Same query returns new/missing rows later because other Tx added/removed rows
- Phantoms due to write skew
- Similar to write skew but new row appears indirectly because of concurrent updates
- Doctor scheduling example
- Straightforward Phantom reads
- Materializing Conflict
- Take a Phantom and turn it into a lock conflict on a concrete set of rows that exist in database
- For room booking, locks can be row in the table that corresponds to the desired room and time period
- It is hard and error-prone to figure out materializing conflict
- It should be considered as last resort
- Use Serializable Isolation Level to solve this issue
| Isolation Level | Dirty Read | Dirty Write | Read Skew (Non-Repeatable Read) | Lost Update | Write Skew | Phantom Read |
|---|---|---|---|---|---|---|
| Read Uncommitted | Yes | Yes | Yes | Yes | Yes | Yes |
| Read Committed | No | No | Yes | Yes | Yes | Yes |
| Snapshot Isolation | No | No | No | Yes | Yes | Yes |
| Serializable | No | No | No | No | No | No |
Isolation Levels
- Weak Isolation Level (Non-Serializable)
- Read Uncommitted
- Read Committed
- Snapshot Isolation
- Serializable Isolation Level
- Ref:
- Wiki: https://en.wikipedia.org/wiki/Isolation_%28database_systems%29
- MSSQL: https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-ver17
- PostgreSQL: https://www.postgresql.org/docs/current/transaction-iso.html
- MySQL: https://dev.mysql.com/doc/refman/8.4/en/innodb-transaction-isolation-levels.html
Read Uncommitted
- Rarely used
- Benefits:
- Prevents Dirty Writes
- Problems:
- Dirty Reads can occur
- Examples:
- MSSQL:
READ UNCOMMITTED - PostgreSQL: Not supported
- MSSQL:
Read Committed
- Very popular
- Default in MSSQL, PostgreSQL, Oracle, etc.
- Benefits
- Prevents dirty reads
- uses Row level locks
- Prevents dirty writes
- remembers old and new data, returns old data until Tx completes
- Prevents dirty reads
- Problems
- Read Skew (Hence Non-Repeatable Reads)
- Long Running Queries will produce inconsistent results
- Examples
- MSSQL:
READ COMMITTED - PostgreSQL:
READ COMMITTED
- MSSQL:
Snapshot Isolation
- Each Tx sees consistent snapshot of DB at the start of Tx
- Benefits
- Prevent Dirty Writes
- Prevent Dirty Reads
- Keep several different committed versions of an object
- Read committed has 2 versions of the object: old and new
- Implemented by MVCC (Multi version concurrency control)
- Prevents Read Skew (Hence Repeatable Read)
- Principle: Writes never block readers, Readers never block writers
- Read Committed uses separate snapshot for each query, while Snapshot Isolation uses the same snapshot for entire Tx
- MVCC Implementation
- When Tx starts, it is given unique monotonically increasing Tx ID:
txid - When Tx writes, the data is tagged with
txidof the writer
- When Tx starts, it is given unique monotonically increasing Tx ID:
- Uses
- Consistent Backups which takes long time
- Long running Analytic Queries
- Problems
- Phantom Reads
- Lost Updates
- Examples
- It was invented after System R, hence some DB vendor still call it Repeatable Read which is very much similar
- MSSQL:
SNAPSHOT - PostgreSQL:
REPEATABLE READ - MySQL:
REPEATABLE READ
Serializable
- Highest Level of Isolation
- Types:
- Actual Serial Execution
- Most Pessimistic
- Don’t scale well
- Two Phase Locking (2PL)
- Pessimistic
- Don’t perform well
- Serializable Snapshot Isolation (SSI)
- Optimistic
- Actual Serial Execution
Actual Serial Execution
- Serial Order on single thread
- Each Tx must be fast
- Keep entire dataset in RAM (in-memory)
- Avoid long running queries
- Useful for OLTP since Tx’s are short
- Don’t allow interactive muti-statement Tx’s
- Use Stored Procedures (SP) — submit code ahead of time
- Traditional DBs (disk based)
- Procedural SQL: T-SQL, PL/SQL, PL/pgSQL
- Vendor Specific
- Lack of libraries
- Code Harder to debug
- Code must not be written badly to use lot of resources
- Procedural SQL: T-SQL, PL/SQL, PL/pgSQL
- Modern in-memory DBs
- VoltDB ⇒ Java/Groovy
- Redis ⇒ Lua
- Datomic ⇒ Java/Clojure
- Traditional DBs (disk based)
- Partitioning
- Tx throughput limited by single CPU core/single machine
- Cross partition Tx’s are very slow
- VoltDB → 1000 writes/s
- Multiple secondary indexes can trigger cross partition coordination
- Try to partition dataset such that each Tx run on single partition
- Examples
- VoltDB
- H-Store
- Redis
- Datomic
Two Phase Locking
- aka Strong Strict Two Phase Locking (SS2PL)
- Principle: Readers/Writers block other Readers/Writers
- Lock Modes
- Shared: Other Tx can hold at the same time
- Exclusive: Only one Tx can hold
- Implementation
- Read Tx takes shared lock
- Write Tx takes exclusive lock
- If Read then Write, shared lock upgrades to exclusive lock
- Must hold lock till the end of Tx
- 2 Phases
- Phase 1: While Tx executing, locks acquired
- Phase 2: At the end of Tx, locks are released
- Deadlocks
- DB detects and aborts Tx, it is retried later
- 2PL Serializable deadlocks are more frequent than Read committed
- Performance
- Tx throughput & response times significantly worse due to:
- Overhead of locks acquired/released
- Reduced concurrency (Major reason)
- Unstable latencies
- Tx throughput & response times significantly worse due to:
- Handling Phantom Reads due to write skews
- Predicate Locks
- Read
- shared mode lock on the condition of query
- If there exists Tx having locks on objects matching condition then query must wait
- Write
- Must check either old or new value of object matches any predicate lock, If yes write must wait
- It works even for objects not yet in DB
- Performance: Don’t perform well due to lot of checks
- Read
- Index Range Locks
- aka next-key locking
- Simplifies predicate locks using indexes
select * from bookings where room_id = 123 and end_time < '2018-01-01 12:00' and start_time > '2018-01-01 13:00'- DB either lock all rows on
room_id = 123(if indexed) - or lock all time range rows on
start_time&end_time(if time indexed) - If no index found ⇒ shared lock on entire table
- DB either lock all rows on
- Not as precise as Predicate Locks
- Real world 2PL = implemented by index range locks
- Predicate Locks
- Example:
- MySQL (InnoDB):
SERIALIZABLE - MSSQL:
SERIALIZABLE - PostgreSQL does not use this
- MySQL (InnoDB):
Serializable Snapshot Isolation
- When Tx wants to commit it checks if anything bad happened, If yes Tx is aborted and retried
- Developed in 2008
- Based on Snapshot Isolation
- Writes never block readers, Readers never block writers
- Performance:
- good if spare capacity + low contention b/w Tx
- advantage over 2PL: no waiting for locks held by other Tx
- advantage over Serial Execution: not limited to throughput of single CPU core
- requires short read-write Tx (long reads are okay)
- Implementation
- Checks stale MVCC reads
- Checks if write occurs after reads
- Examples
- PostgreSQL:
SERIALIZABLEsince version 9.1
- PostgreSQL: