Transactions

Group of several reads and writes to form a logical unit
Whole transaction is executed as one operation
- Either the entire transaction succeeds (commit)
- Or fails (abort, rollback)
DB objects include Rows, Documents, Records etc.
Atomicity can be implemented using log for crash recovery
Isolation can be implemented using locks

Serializability

Each Tx can pretend it is the only one running
The result is same as if they had run serially

Atomic Write Operations

aka Light Weight Transactions
Used for single object writes
Examples:
- read-modify-write
  - for example, increment value atomically
  - UPDATE counters set value = value + 1 where foo = 'key'
  - Implementation
    - by exclusive lock on the DB object, aka cursor stability
    - by forcing all atomic operations serially
- compare-and-set
  - allow update to happen only if the value has not changed since you last read it
  - If it is modified, then it is retried
  - It is optimistic in nature
```
UPDATE wiki where content = 'new'
WHERE id = 123 and content = 'old'
```
  - DB has different implementations, always verify if it’s safe
Some Databases support it
- MongoDB supports local modification of JSON atomically
- Redis supports modifying priority queues atomically

Retrying

Aborted Tx can be retried
If Tx actually succeeded but network failed, then Retrying can cause duplicate data
If error is due to overload, retrying will make it worse
- Limit number of retries
- Use exponential backoff
Useful for transient errors
- deadlock, isolation violation, network interruption, failover
Not useful for permanent errors
- constraint violation
Need to consider if Tx has side-effect like sending an email
- 2PC can help in multiple systems commit or abort together

Race Conditions

Dirty Reads

One Tx reads the writes of another ongoing Tx which is not committed
If other Tx is ongoing, and first one see partially updated (“dirty”) data
If other Tx rollbacks, then first one has read invalid (“dirty”) data

Dirty Writes

One Tx overwrites data that another Tx has written but not committed
If Both Tx update multiple objects concurrently:
- update listing (Second Tx then First Tx)
- update invoice (First Tx then Second Tx)
Due to race condition, This causes inconsistent data on both tables

Read Skew

Skew = inconsistent
Client sees different parts of DB at different points in time
Example of Non-Repeatable read
In case of transferring money from one account to another
- Initially
  - Account 1: ₹ 500
  - Account 2: ₹ 500
  - Sum = ₹ 1,000
- First Tx writes
  - Account 1: balance = balance - 100
  - Account 2: balance = balance + 100
- Meanwhile Second Tx reads:
  - Account 1: ₹ 500 (read during First Tx, will see old value)
  - Account 2: ₹ 600 (read after First Tx completed, will see new value)
  - Sum = ₹ 900 (Wrong conclusion)
Hence reading DB at different time during Tx can cause reading overall inconsistent data
https://stackoverflow.com/questions/73917534/read-skew-vs-non-repeatable-read-transaction

Lost Updates

Read-Modify-Write cycle done by two Tx concurrently on same object
Data is lost since one overwrites the other
- Both Tx see current value as 500
- Both calculate the update to increment to 600
- If Both set the value then the final will only reach 600 (not 700)
Very common problem in real world
Read-Modify-Write cycle examples
- Increment counter
- Add element to list in JSON Document
- Two users editing same wiki page and saving it

Prevention

Use Atomic Write operations

Use Explicit Locking

BEGIN TRANSACTION;
 
-- take a lock on all the rows returned by query
SELECT * from figures
WHERE name = 'robot' and game_id = 222
FOR UPDATE;
 
-- update position of character in game
UPDATE figures SET position = 'c4' WHERE id = 1234;
 
COMMIT;

Detection
- Tx Manager detects if lost update happened, and abort or retry it
- Following aborts Tx
  - PostgreSQL ⇒ Repeatable Read
  - Oracle ⇒ Serializable
  - MSSQL ⇒ Snapshot Isolation
- MySQL (InnoDB) does not detect lost updates
Distributed Context
- If atomic operations are commutative then it is easy to implement
- Last Write Wins (LWW) conflict resolution in Replication causes lost updates

Write Skew

Two Tx reads same object and writes on different object
Both Tx makes decision based on the value read, but at the time of commit, the premise is no longer true
Generalization of lost updates
- Compared to Lost updates, Write skew updates different object
Examples:
- Doctor scheduling system
  - Doctors count(on_call) = 2, both request leave
  - Both Tx wants to update on_call = false for both doctors
  - Both checks if count(on_call) >= 1 which is true
  - Both update the on_call = false, causing count(on_call) = 0
- Meeting Room Booking: can cause conflicting bookings
  - Premise: Check if meeting room empty
  - Cannot use locks
- Multiplayer Game: moving two characters to the same spot
- Claiming same usernames by two users

Phantom Reads

Phantom = ghost
Write in one Tx changes result of query in another Tx
You read a set of rows, then later new rows appear that match your condition.
Types:
- Straightforward Phantom reads
  - Same query returns new/missing rows later because other Tx added/removed rows
- Phantoms due to write skew
  - Similar to write skew but new row appears indirectly because of concurrent updates
  - Doctor scheduling example
Materializing Conflict
- Take a Phantom and turn it into a lock conflict on a concrete set of rows that exist in database
- For room booking, locks can be row in the table that corresponds to the desired room and time period
- It is hard and error-prone to figure out materializing conflict
- It should be considered as last resort
Use Serializable Isolation Level to solve this issue

Isolation Level	Dirty Read	Dirty Write	Read Skew (Non-Repeatable Read)	Lost Update	Write Skew	Phantom Read
Read Uncommitted	Yes	Yes	Yes	Yes	Yes	Yes
Read Committed	No	No	Yes	Yes	Yes	Yes
Snapshot Isolation	No	No	No	Yes	Yes	Yes
Serializable	No	No	No	No	No	No

Isolation Levels

Weak Isolation Level (Non-Serializable)
- Read Uncommitted
- Read Committed
- Snapshot Isolation
Serializable Isolation Level
Ref:

Read Uncommitted

Rarely used
Benefits:
- Prevents Dirty Writes
Problems:
- Dirty Reads can occur
Examples:
- MSSQL: READ UNCOMMITTED
- PostgreSQL: Not supported

Read Committed

Very popular
Default in MSSQL, PostgreSQL, Oracle, etc.
Benefits
- Prevents dirty reads
  - uses Row level locks
- Prevents dirty writes
  - remembers old and new data, returns old data until Tx completes
Problems
- Read Skew (Hence Non-Repeatable Reads)
- Long Running Queries will produce inconsistent results
Examples
- MSSQL: READ COMMITTED
- PostgreSQL: READ COMMITTED

Snapshot Isolation

Each Tx sees consistent snapshot of DB at the start of Tx
Benefits
- Prevent Dirty Writes
- Prevent Dirty Reads
  - Keep several different committed versions of an object
  - Read committed has 2 versions of the object: old and new
  - Implemented by MVCC (Multi version concurrency control)
- Prevents Read Skew (Hence Repeatable Read)
Principle: Writes never block readers, Readers never block writers
Read Committed uses separate snapshot for each query, while Snapshot Isolation uses the same snapshot for entire Tx
MVCC Implementation
- When Tx starts, it is given unique monotonically increasing Tx ID: txid
- When Tx writes, the data is tagged with txid of the writer
Uses
- Consistent Backups which takes long time
- Long running Analytic Queries
Problems
- Phantom Reads
- Lost Updates
Examples
- It was invented after System R, hence some DB vendor still call it Repeatable Read which is very much similar
- MSSQL: SNAPSHOT
- PostgreSQL: REPEATABLE READ
- MySQL: REPEATABLE READ

Serializable

Highest Level of Isolation
Types:
- Actual Serial Execution
  - Most Pessimistic
  - Don’t scale well
- Two Phase Locking (2PL)
  - Pessimistic
  - Don’t perform well
- Serializable Snapshot Isolation (SSI)
  - Optimistic

Actual Serial Execution

Serial Order on single thread
Each Tx must be fast
- Keep entire dataset in RAM (in-memory)
- Avoid long running queries
  - Useful for OLTP since Tx’s are short
- Don’t allow interactive muti-statement Tx’s
- Use Stored Procedures (SP) — submit code ahead of time
  - Traditional DBs (disk based)
    - Procedural SQL: T-SQL, PL/SQL, PL/pgSQL
      - Vendor Specific
      - Lack of libraries
    - Code Harder to debug
    - Code must not be written badly to use lot of resources
  - Modern in-memory DBs
    - VoltDB ⇒ Java/Groovy
    - Redis ⇒ Lua
    - Datomic ⇒ Java/Clojure
Partitioning
- Tx throughput limited by single CPU core/single machine
- Cross partition Tx’s are very slow
  - VoltDB → 1000 writes/s
- Multiple secondary indexes can trigger cross partition coordination
- Try to partition dataset such that each Tx run on single partition
Examples
- VoltDB
- H-Store
- Redis
- Datomic

Two Phase Locking

aka Strong Strict Two Phase Locking (SS2PL)
Principle: Readers/Writers block other Readers/Writers
Lock Modes
- Shared: Other Tx can hold at the same time
- Exclusive: Only one Tx can hold
Implementation
- Read Tx takes shared lock
- Write Tx takes exclusive lock
- If Read then Write, shared lock upgrades to exclusive lock
- Must hold lock till the end of Tx
- 2 Phases
  - Phase 1: While Tx executing, locks acquired
  - Phase 2: At the end of Tx, locks are released
Deadlocks
- DB detects and aborts Tx, it is retried later
- 2PL Serializable deadlocks are more frequent than Read committed
Performance
- Tx throughput & response times significantly worse due to:
  - Overhead of locks acquired/released
  - Reduced concurrency (Major reason)
- Unstable latencies
Handling Phantom Reads due to write skews
- Predicate Locks
  - Read
    - shared mode lock on the condition of query
    - If there exists Tx having locks on objects matching condition then query must wait
  - Write
    - Must check either old or new value of object matches any predicate lock, If yes write must wait
  - It works even for objects not yet in DB
  - Performance: Don’t perform well due to lot of checks
- Index Range Locks
  - aka next-key locking
  - Simplifies predicate locks using indexes
```
select * from bookings
where room_id = 123
and end_time < '2018-01-01 12:00'
and start_time > '2018-01-01 13:00'
```
    - DB either lock all rows on room_id = 123 (if indexed)
    - or lock all time range rows on start_time & end_time (if time indexed)
    - If no index found ⇒ shared lock on entire table
  - Not as precise as Predicate Locks
  - Real world 2PL = implemented by index range locks
Example:
- MySQL (InnoDB): SERIALIZABLE
- MSSQL: SERIALIZABLE
- PostgreSQL does not use this

Serializable Snapshot Isolation

When Tx wants to commit it checks if anything bad happened, If yes Tx is aborted and retried
Developed in 2008
Based on Snapshot Isolation
- Writes never block readers, Readers never block writers
Performance:
- good if spare capacity + low contention b/w Tx
- advantage over 2PL: no waiting for locks held by other Tx
- advantage over Serial Execution: not limited to throughput of single CPU core
- requires short read-write Tx (long reads are okay)
Implementation
- Checks stale MVCC reads
- Checks if write occurs after reads
Examples
- PostgreSQL: SERIALIZABLE since version 9.1

Experiments

Explorer

Database_Transactions

Transactions

Serializability

Atomic Write Operations

Retrying

Race Conditions

Dirty Reads

Dirty Writes

Read Skew

Lost Updates

Write Skew

Phantom Reads

Isolation Levels

Read Uncommitted

Read Committed

Snapshot Isolation

Serializable

Actual Serial Execution

Two Phase Locking

Serializable Snapshot Isolation

Table of Contents