Bloom Filters

  • Offers very fast lookup
  • space efficient, probabilistic checks
  • tradeoff between space and accuracy
  • Returns
    • Maybe Yes (Possible False Positive)
    • Firm No (No False Negative)
  • Used along with Cache systems, and act as first line of check
  • False Positives can be reduced by increasing Bit Vector size
  • Examples
    • Check Username
    • Web Crawler to check if URL is already visited
    • Avoid One Hit Wonders
      • CDN like Akamai prevent caching one-hit wonders
        • web pages that are only requested once
        • 75% pages are one-hit wonders
      • Facebook to avoid rarely search data in bloom filers to avoid one-hit wonders
    • Chrome in past used to check URL is malicious
    • Databases
      • Cassandra uses it to avoid unnecessary database lookup in SSTable
      • PostgresQL, Google BigTable for same reason
      • LSM Tree Based NoSQL Databases
    • Spell Checkers to check against dictionary
    • Password validators use it to avoid weak password

Working

  • Create a bit array representing 0s and 1s
  • Initialize with all 0s
  • Have set of hash functions , , , …
  • Given an element, apply all hash functions to the element
  • For each result from each hash function, flip 0 to 1
  • Any new element if it enters system, has to apply all the hash functions
    • If there is 1 present already in the bit array for each function, then it is a Maybe-Yes
    • Else, It is a Firm-No

Example

  • Assume we have following strings in our set:
Hash Functions”Hello""World”
34
57
  • Corresponding Bit Vector will be:
Index01234567
Bit00011101
  • Checking new Words if they are present in the set:
Hash Functions”Coca""Cola”
30
72
Present in SetYes (Wrong)No (Right)
InterpretationFalse Positive
(Maybe-Yes)
False Negative
(Absolute-No)