NoSQL Databases

Types and use cases

  • Key-Value
  • Document
  • Wide Column
  • Graph
  • Time Series

Key-Value

  • Simple Data Storage
  • Highly scalable
  • Useful for:
    • Caching: Faster performance of web apps
    • Session Management: storing and recalling user information quickly
    • Leaderboards: Real-time high-availability data slicing
    • Real time Inventory system
    • Fraud mitigation
    • Claims processing
  • Example:
    • Redis
    • Amazon DynamoDB
    • Riak
    • Memcached (Redis is a better alternative)
    • Couchbase
> MSET firstname "Mel" lastname "McGee"
> MGET firstname lastname
1) "MEL"
2) "McGee"

Document

  • General Purpose
  • Not requiring complex queries
  • Useful for:
    • Content Management System (CMS)
    • Large Document Storage: Files, general data
    • Websites: unstructured data
    • User Profiles
    • Real-time analytics
    • Big Data
  • Examples:
    • MongoDB
    • Elastic Search
    • Apache CouchDB
    • Azure CosmosDB for NoSQL
    • Amazon DocumentDB
db.bios.find( { "name.last": "McGee" } )
db.bios.find( { "name.first": "Mel" } )

Wide-Column

  • Subset of key-value
  • Tables, rows, columns but columns can vary
  • Focus on app’s queries while modeling, not your app’s data
  • Large volumes of data
  • Useful for:
    • Querying specific parts of data but not whole rows
    • Big Data: Fast searching
    • Real-time Inventory Management
    • Real-time analytics
  • Examples:
    • Apache Cassandra
    • Apache HBase
    • Google Bigtable
    • ScyllaDB
"Bonuses": {
    row1: { "ID": 1, "Last": "McGee", "First": "Mel", "Bonus": 8000 },
    row2: { "ID": 2, "Last": "Smith", "First": "Jane", "Bonus": 4000 },
}
 
-- Cassandra uses CQL (Cassandra Query Language)
UPDATE Customer SET branch = 'main' WHERE custage > 2;

Graph

  • Stored data in nodes that inherently shows relationships via edges
  • Best used for relation-type analysis
  • Useful for:
    • Social Networks
    • Fraud Detection
    • Recommendation engine
    • logistics and routing
  • Examples:
    • Neo4j
    • JanusGraph
    • Azure CosmosDB for Gremlin
    • Amazon Neptune
// Neo4j query language is called cypher
 
// creating data
CREATE (p:Person)-[:LIKES]->(t:Technology)
 
// querying data
MATCH (p:Person)-[:LIKES]-(t:Technology)

Time Series

  • Optimized for time-stamped, or time series, data
  • Use LSM trees for fast ingestion, but break table into many small indexes by both ingestion source and timestamp
  • Allows for placing the whole index in CPU cache for better performance, quick deletes of whole index when no longer relevant (as opposed to typical tombstone method)
  • Used for:
    • IoT data
    • Metrics analysis
    • Application monitoring
    • Understand financial trends
  • Examples:
    • InfluxDB
    • Prometheus
      • open-source monitoring tool and include a database with the same name
    • Apache Druid
    • TimescaleDB