Change Data Capture

  • Process of observing all data changes written to a database and extracting them in a form in which they can be replicated to other systems
  • It helps in ensuring all changes made to the system are also reflected in derived data systems
  • Makes one DB leader and others followers
  • Generally Async

Derived Data Systems

  • Search Index
  • Data Warehouse
  • Recommendation Systems

Implementation

  • DB triggers can be used but
    • they are fragile
    • significant performance overheads
  • LinkedIn’s Databus
  • Facebook’s Wormhole
  • Yahoo! Sherpa
  • Bottled Water for PostgreSQL
    • reads write-ahead log
  • Maxwell and Debezium for MySQL
    • reads binlog
  • Mongoriver for MongoDB
    • reads oplog
  • GoldenGate for Oracle
  • Kafka Connect provides connectors for various DBs to integrate with Kafka

Reading Changes

  • If you don’t have full copy of all the logs, then DB Snapshot is needed
    • implemented in some CDC implementations
  • Log compaction is also a good alternative where duplicate keys in the log is discarded

Transporting Changes

  • Log based message broker is suitable to transport change events from the source DB since it preserves the ordering

Questions

  • how to take snapshots and take dump?
  • How to do Change Data Capture