OLAP and OLTP

  • https://stackoverflow.com/questions/21900185/what-are-oltp-and-olap-what-is-the-difference-between-them
  • OL — “ONLINE”
    • completion of data processing in real time and not batch-processing
  • OLTP — Online Transaction Processing
    • large number of short on-line transactions (INSERT, UPDATE, DELETE)
    • fast query processing, maintaining data integrity
    • effectiveness measured by number of transactions per second
  • OLAP — Online Analytical Processing
    • Historical Data
    • Relatively low volume of transactions
    • Queries are often very complex and involve aggregations
    • suited for analytics, data mining, fewer queries but they are usually bigger
    • effectiveness measured by response time
PropertyOLTPOLAP
Main Read PatternSmall number of recordsAggregate over large number of records
Main Write PatternRandom access,
low latency writes from user input
Batch Processing or Event Stream
Primary Used byEnd User/CustomerInternal Analyst for Business Intelligence (BI)
What Data RepresentsLatest state of DataHistory of events that happened over time
Dataset sizeGBs to TBsTBs to PBs

OLAP

  • DB is known as Data Warehouse where Analyst can run queries without interfering with live data
  • Process of getting data into data warehouse is called Extract-Transform-Load (ETL)
  • Schema:
    • Star schema
      • aka dimensional modeling
      • center of schema is fact table where each row = event at particular time
        • can have 100s of columns
      • columns are foreign keys to other tables called dimension tables
      • Simpler to work with
    • Snowflake schema
      • Similar to Star schema
      • But its dimensions are further broken down into sub-dimensions
  • Query often involves aggregation of data
    • For example: COUNT, MAX, MIN, AVG
    • Materialized Views are created to cache the data
      • actual copy of the query results, written to disk
      • Different from virtual views, where the query is executed when the view is read
      • Makes writes expensive
    • Data Cubes
      • aka OLAP cube
      • Special case of Materialized Views
      • grid of aggregates grouped by different dimensions
      • for example we precompute (2-dimensions)
        • sum of price of products on a given date
        • sum of price of products on all dates
      • Only used when performance boost needed for a particular query
  • Uses Database_Column_Oriented
  • Examples
    • Snowflake
    • Apache Spark (for batch processing)
    • Google BigQuery
    • DuckDB (in-memory)

Types of OLAP

  • Not sure if this is useful
  • ROLAP — Relational OLAP
    • complex query possible
    • slower
  • MOLAP — Multi-dimensional OLAP
    • uses dimensions
    • faster
  • HOLAP — Hybrid OLAP (ROLAP + MOLAP)

Data Lake

  • Unstructured Data compared to Data Warehouse which has structured data