Change Data Capture
- Process of observing all data changes written to a database and extracting them in a form in which they can be replicated to other systems
- It helps in ensuring all changes made to the system are also reflected in derived data systems
- Makes one DB leader and others followers
- Generally Async
Derived Data Systems
- Search Index
- Data Warehouse
- Recommendation Systems
Implementation
- DB triggers can be used but
- they are fragile
- significant performance overheads
- LinkedIn’s Databus
- Facebook’s Wormhole
- Yahoo! Sherpa
- Bottled Water for PostgreSQL
- Maxwell and Debezium for MySQL
- Mongoriver for MongoDB
- GoldenGate for Oracle
- Kafka Connect provides connectors for various DBs to integrate with Kafka
Reading Changes
- If you don’t have full copy of all the logs, then DB Snapshot is needed
- implemented in some CDC implementations
- Log compaction is also a good alternative where duplicate keys in the log is discarded
Transporting Changes
- Log based message broker is suitable to transport change events from the source DB since it preserves the ordering
Questions
- how to take snapshots and take dump?
- How to do Change Data Capture