Distributed Algorithms

Systems model

  • Based on Timing Assumptions
    • Sync model - bounded n/w delay, process pause, clock error
    • Partially Async model
      • Sync model but unbounded n/w delay, process pauses, clock drifts
      • Assumes eventually system will become sync again
    • Async model
      • No Timing assumption
      • cannot use timeouts
  • Based on Failures
    • Crash stop faults - can only crash once
    • Crash recovery faults
      • crash and can start again
      • data in stable storage survives the crash
      • in-memory state is lost
    • Byzantine (arbitrary) faults - anything can happen
  • Real Distributed Systems modeling is based on
    • Partially Async model
    • Crash recovery faults

Distributed Algorithm

  • Define correctness of algorithm
  • describe properties
    • safety properties
      • nothing bad happens
      • once violated cannot be undone
    • liveness properties
      • eventually good happens
      • always hope it maybe satisfied in the future

Example: Fencing Token

  • Lock service generates a number whenever lock is granted
  • Timeout will automatically release the lock if not released explicitly
  • Safety Properties
    • Monotonic
    • Unique
  • Liveness Properties
    • Availability
    • Eventual Consistency