Transactions

  • Group of several reads and writes to form a logical unit
  • Whole transaction is executed as one operation
    • Either the entire transaction succeeds (commit)
    • Or fails (abort, rollback)
  • DB objects include Rows, Documents, Records etc.
  • Atomicity can be implemented using log for crash recovery
  • Isolation can be implemented using locks

Serializability

  • Each Tx can pretend it is the only one running
  • The result is same as if they had run serially

Atomic Write Operations

  • aka Light Weight Transactions
  • Used for single object writes
  • Examples:
    • read-modify-write
      • for example, increment value atomically
      • UPDATE counters set value = value + 1 where foo = 'key'
      • Implementation
        • by exclusive lock on the DB object, aka cursor stability
        • by forcing all atomic operations serially
    • compare-and-set
      • allow update to happen only if the value has not changed since you last read it
      • If it is modified, then it is retried
      • It is optimistic in nature
      UPDATE wiki where content = 'new'
      WHERE id = 123 and content = 'old'
      • DB has different implementations, always verify if it’s safe
  • Some Databases support it
    • MongoDB supports local modification of JSON atomically
    • Redis supports modifying priority queues atomically

Retrying

  • Aborted Tx can be retried
  • If Tx actually succeeded but network failed, then Retrying can cause duplicate data
  • If error is due to overload, retrying will make it worse
    • Limit number of retries
    • Use exponential backoff
  • Useful for transient errors
    • deadlock, isolation violation, network interruption, failover
  • Not useful for permanent errors
    • constraint violation
  • Need to consider if Tx has side-effect like sending an email
    • 2PC can help in multiple systems commit or abort together

Race Conditions

Dirty Reads

  • One Tx reads the writes of another ongoing Tx which is not committed
  • If other Tx is ongoing, and first one see partially updated (“dirty”) data
  • If other Tx rollbacks, then first one has read invalid (“dirty”) data

Dirty Writes

  • One Tx overwrites data that another Tx has written but not committed
  • If Both Tx update multiple objects concurrently:
    • update listing (Second Tx then First Tx)
    • update invoice (First Tx then Second Tx)
  • Due to race condition, This causes inconsistent data on both tables

Read Skew

  • Skew = inconsistent
  • Client sees different parts of DB at different points in time
  • Example of Non-Repeatable read
  • In case of transferring money from one account to another
    • Initially
      • Account 1: ₹ 500
      • Account 2: ₹ 500
      • Sum = ₹ 1,000
    • First Tx writes
      • Account 1: balance = balance - 100
      • Account 2: balance = balance + 100
    • Meanwhile Second Tx reads:
      • Account 1: ₹ 500 (read during First Tx, will see old value)
      • Account 2: ₹ 600 (read after First Tx completed, will see new value)
      • Sum = ₹ 900 (Wrong conclusion)
  • Hence reading DB at different time during Tx can cause reading overall inconsistent data
  • https://stackoverflow.com/questions/73917534/read-skew-vs-non-repeatable-read-transaction

Lost Updates

  • Read-Modify-Write cycle done by two Tx concurrently on same object
  • Data is lost since one overwrites the other
    • Both Tx see current value as 500
    • Both calculate the update to increment to 600
    • If Both set the value then the final will only reach 600 (not 700)
  • Very common problem in real world
  • Read-Modify-Write cycle examples
    • Increment counter
    • Add element to list in JSON Document
    • Two users editing same wiki page and saving it
  • Prevention
    • Use Atomic Write operations
    • Use Explicit Locking
      BEGIN TRANSACTION;
       
      -- take a lock on all the rows returned by query
      SELECT * from figures
      WHERE name = 'robot' and game_id = 222
      FOR UPDATE;
       
      -- update position of character in game
      UPDATE figures SET position = 'c4' WHERE id = 1234;
       
      COMMIT;
  • Detection
    • Tx Manager detects if lost update happened, and abort or retry it
    • Following aborts Tx
      • PostgreSQL Repeatable Read
      • Oracle Serializable
      • MSSQL Snapshot Isolation
    • MySQL (InnoDB) does not detect lost updates
  • Distributed Context
    • If atomic operations are commutative then it is easy to implement
    • Last Write Wins (LWW) conflict resolution in Replication causes lost updates

Write Skew

  • Two Tx reads same object and writes on different object
  • Both Tx makes decision based on the value read, but at the time of commit, the premise is no longer true
  • Generalization of lost updates
    • Compared to Lost updates, Write skew updates different object
  • Examples:
    • Doctor scheduling system
      • Doctors count(on_call) = 2, both request leave
      • Both Tx wants to update on_call = false for both doctors
      • Both checks if count(on_call) >= 1 which is true
      • Both update the on_call = false, causing count(on_call) = 0
    • Meeting Room Booking: can cause conflicting bookings
      • Premise: Check if meeting room empty
      • Cannot use locks
    • Multiplayer Game: moving two characters to the same spot
    • Claiming same usernames by two users

Phantom Reads

  • Phantom = ghost
  • Write in one Tx changes result of query in another Tx
  • You read a set of rows, then later new rows appear that match your condition.
  • Types:
    • Straightforward Phantom reads
      • Same query returns new/missing rows later because other Tx added/removed rows
    • Phantoms due to write skew
      • Similar to write skew but new row appears indirectly because of concurrent updates
      • Doctor scheduling example
  • Materializing Conflict
    • Take a Phantom and turn it into a lock conflict on a concrete set of rows that exist in database
    • For room booking, locks can be row in the table that corresponds to the desired room and time period
    • It is hard and error-prone to figure out materializing conflict
    • It should be considered as last resort
  • Use Serializable Isolation Level to solve this issue
Isolation LevelDirty ReadDirty WriteRead Skew
(Non-Repeatable Read)
Lost UpdateWrite SkewPhantom Read
Read UncommittedYesYesYesYesYesYes
Read CommittedNoNoYesYesYesYes
Snapshot IsolationNoNoNoYesYesYes
SerializableNoNoNoNoNoNo

Isolation Levels

Read Uncommitted

  • Rarely used
  • Benefits:
    • Prevents Dirty Writes
  • Problems:
    • Dirty Reads can occur
  • Examples:
    • MSSQL: READ UNCOMMITTED
    • PostgreSQL: Not supported

Read Committed

  • Very popular
  • Default in MSSQL, PostgreSQL, Oracle, etc.
  • Benefits
    • Prevents dirty reads
      • uses Row level locks
    • Prevents dirty writes
      • remembers old and new data, returns old data until Tx completes
  • Problems
    • Read Skew (Hence Non-Repeatable Reads)
    • Long Running Queries will produce inconsistent results
  • Examples
    • MSSQL: READ COMMITTED
    • PostgreSQL: READ COMMITTED

Snapshot Isolation

  • Each Tx sees consistent snapshot of DB at the start of Tx
  • Benefits
    • Prevent Dirty Writes
    • Prevent Dirty Reads
      • Keep several different committed versions of an object
      • Read committed has 2 versions of the object: old and new
      • Implemented by MVCC (Multi version concurrency control)
    • Prevents Read Skew (Hence Repeatable Read)
  • Principle: Writes never block readers, Readers never block writers
  • Read Committed uses separate snapshot for each query, while Snapshot Isolation uses the same snapshot for entire Tx
  • MVCC Implementation
    • When Tx starts, it is given unique monotonically increasing Tx ID: txid
    • When Tx writes, the data is tagged with txid of the writer
  • Uses
    • Consistent Backups which takes long time
    • Long running Analytic Queries
  • Problems
    • Phantom Reads
    • Lost Updates
  • Examples
    • It was invented after System R, hence some DB vendor still call it Repeatable Read which is very much similar
    • MSSQL: SNAPSHOT
    • PostgreSQL: REPEATABLE READ
    • MySQL: REPEATABLE READ

Serializable

  • Highest Level of Isolation
  • Types:
    • Actual Serial Execution
      • Most Pessimistic
      • Don’t scale well
    • Two Phase Locking (2PL)
      • Pessimistic
      • Don’t perform well
    • Serializable Snapshot Isolation (SSI)
      • Optimistic

Actual Serial Execution

  • Serial Order on single thread
  • Each Tx must be fast
    • Keep entire dataset in RAM (in-memory)
    • Avoid long running queries
      • Useful for OLTP since Tx’s are short
    • Don’t allow interactive muti-statement Tx’s
    • Use Stored Procedures (SP) — submit code ahead of time
      • Traditional DBs (disk based)
        • Procedural SQL: T-SQL, PL/SQL, PL/pgSQL
          • Vendor Specific
          • Lack of libraries
        • Code Harder to debug
        • Code must not be written badly to use lot of resources
      • Modern in-memory DBs
        • VoltDB Java/Groovy
        • Redis Lua
        • Datomic Java/Clojure
  • Partitioning
    • Tx throughput limited by single CPU core/single machine
    • Cross partition Tx’s are very slow
      • VoltDB 1000 writes/s
    • Multiple secondary indexes can trigger cross partition coordination
    • Try to partition dataset such that each Tx run on single partition
  • Examples
    • VoltDB
    • H-Store
    • Redis
    • Datomic

Two Phase Locking

  • aka Strong Strict Two Phase Locking (SS2PL)
  • Principle: Readers/Writers block other Readers/Writers
  • Lock Modes
    • Shared: Other Tx can hold at the same time
    • Exclusive: Only one Tx can hold
  • Implementation
    • Read Tx takes shared lock
    • Write Tx takes exclusive lock
    • If Read then Write, shared lock upgrades to exclusive lock
    • Must hold lock till the end of Tx
    • 2 Phases
      • Phase 1: While Tx executing, locks acquired
      • Phase 2: At the end of Tx, locks are released
  • Deadlocks
    • DB detects and aborts Tx, it is retried later
    • 2PL Serializable deadlocks are more frequent than Read committed
  • Performance
    • Tx throughput & response times significantly worse due to:
      • Overhead of locks acquired/released
      • Reduced concurrency (Major reason)
    • Unstable latencies
  • Handling Phantom Reads due to write skews
    • Predicate Locks
      • Read
        • shared mode lock on the condition of query
        • If there exists Tx having locks on objects matching condition then query must wait
      • Write
        • Must check either old or new value of object matches any predicate lock, If yes write must wait
      • It works even for objects not yet in DB
      • Performance: Don’t perform well due to lot of checks
    • Index Range Locks
      • aka next-key locking
      • Simplifies predicate locks using indexes
        select * from bookings
        where room_id = 123
        and end_time < '2018-01-01 12:00'
        and start_time > '2018-01-01 13:00'
        • DB either lock all rows on room_id = 123 (if indexed)
        • or lock all time range rows on start_time & end_time (if time indexed)
        • If no index found shared lock on entire table
      • Not as precise as Predicate Locks
      • Real world 2PL = implemented by index range locks
  • Example:
    • MySQL (InnoDB): SERIALIZABLE
    • MSSQL: SERIALIZABLE
    • PostgreSQL does not use this

Serializable Snapshot Isolation

  • When Tx wants to commit it checks if anything bad happened, If yes Tx is aborted and retried
  • Developed in 2008
  • Based on Snapshot Isolation
    • Writes never block readers, Readers never block writers
  • Performance:
    • good if spare capacity + low contention b/w Tx
    • advantage over 2PL: no waiting for locks held by other Tx
    • advantage over Serial Execution: not limited to throughput of single CPU core
    • requires short read-write Tx (long reads are okay)
  • Implementation
    • Checks stale MVCC reads
    • Checks if write occurs after reads
  • Examples
    • PostgreSQL: SERIALIZABLE since version 9.1