Distributed Algorithms
Systems model
- Based on Timing Assumptions
- Sync model - bounded n/w delay, process pause, clock error
- Partially Async model
- Sync model but unbounded n/w delay, process pauses, clock drifts
- Assumes eventually system will become sync again
- Async model
- No Timing assumption
- cannot use timeouts
- Based on Failures
- Crash stop faults - can only crash once
- Crash recovery faults
- crash and can start again
- data in stable storage survives the crash
- in-memory state is lost
- Byzantine (arbitrary) faults - anything can happen
- Real Distributed Systems modeling is based on
- Partially Async model
- Crash recovery faults
Distributed Algorithm
- Define correctness of algorithm
- describe properties
- safety properties
- nothing bad happens
- once violated cannot be undone
- liveness properties
- eventually good happens
- always hope it maybe satisfied in the future
Example: Fencing Token
- Lock service generates a number whenever lock is granted
- Timeout will automatically release the lock if not released explicitly
- Safety Properties
- Liveness Properties
- Availability
- Eventual Consistency