Relational DB Indexing

  • used to improve the speed of data retrieval operations on the data store.
  • Purpose:
    • Speed up access to data
    • Help enforce constraint
    • Typically smaller than table
    • Reduce the need for full scan
    • Stored in memory or on disk
  • Latency Numbers
    • RAM Memory: 100ns
    • Read 1 MB SSD: 1,000,000ns or 1ms
    • Read 1 MB HDD: 30,000,000ns or 30ms
  • Trade-offs:
    • increased storage overhead (to store index)
    • slower writes (since we not only have to write the data but also have to update the index)

Indexing Types

MSSQL

How it works

  • It is similar to Book indexes at the end of the book
    • We have list of terms sorted alphabetically, so that the reader can find the interested term
    • for each term there is page number written so that reader can directly jump to the page
  • Let’s assume following data in person table
idfirst_namelast_name
1LiamMiller
2FionaWatson
3GaryBeckham
4HarryMiller
5LouiseSmith
  • Suppose we create DB index on a column
-- create clustered index
-- theoretically can be unique or not
-- in practice it is always unique
CREATE CLUSTERED INDEX idx_pname
ON person (last_name, first_name);
 
-- create non clustered index
CREATE INDEX idx_pname
ON person (last_name, first_name);
 
-- To create unique non-clustered index
-- duplicate values are not allowed
CREATE UNIQUE INDEX idx_pname  
ON person (last_name, first_name);
  • What happens is that internally, we will have a data structure, which will store list of rows sorted by last_name, and then first_name
  • Below is example index:
last_namefirst_nameid
BeckhamGary3
MillerHarry4
MillerLiam1
SmithLouise5
WatsonFiona2
  • Now we perform the following query
SELECT id, first_name, last_name
FROM person
WHERE last_name = 'Miller'
AND first_name= 'Liam'
  • The above query causes lookup in index idx_pname, since everything is already sorted, we can perform binary search in index to find last_name as Miller and then perform binary search on first_name as Liam on the rows with Miller as last_name

Non-relevant columns in select query

Is Non Unique Columns bad for indexing?

  • What if the column values are not unique which you are indexing, is it bad idea?

Geo-Spatial Index

  • Uber lyft — Key value store??