Research Papers

  • Google File System
    • Distributed File System
  • Google Bigtable
    • Wide column NoSQL
  • Amazon DynamoDB
    • Document and Key-Value NoSQL
  • Apache Cassandra
    • Wide Column NoSQL
  • MapReduce by Google
  • Google Chubby
    • Distributed Lock Service
  • Google Zanzibar
    • Distributed Authorization Service
  • Google Dremel
  • Google Spanner
    • Globally distributed strongly consistent DB
  • Apache Cassandra
    • developed at Facebook
    • inspired by Google Bigtable and Amazon Dynamo
    • written in Java
  • Apache HBase
    • inspired by Google Bigtable
    • runs on top of HDFS
    • part of Apache Hadoop ecosystem
    • written in Java
  • Apache Hadoop
    • core includes MapReduce and HDFS
    • inspired by Google File System and MapReduce by Google
    • MapReduce is not used much these days
  • Apache Kafka
    • developed at LinkedIn
    • named after author Franz Kafka
    • written in Java/Scala
  • ScyllaDB
  • Apache Zookeeper
    • developed at Yahoo!
    • written in Java
    • inspired by Google Chubby
  • Apache Spark
    • developed at University of California, Berkeley
    • written in Scala
    • improvement over MapReduce
  • Snowflake
    • developed at Snowflake Inc.
    • written in Java/C++
  • Google BigQuery
    • developed at Google
    • serverless data warehouse
    • query engine uses Google Dremel
  • Apache Parquet
    • developed by Twitter and Cloudera
    • inspired by Google Dremel
    • Columnar storage format
  • Google Spanner
    • developed at Google
    • Paxos Algo
    • TrueTime API

Apache Spark

  • Used in batch processing
  • improves on Hadoop MapReduce