Relational DB Indexing

used to improve the speed of data retrieval operations on the data store.
Purpose:
- Speed up access to data
- Help enforce constraint
- Typically smaller than table
- Reduce the need for full scan
- Stored in memory or on disk
Latency Numbers
- RAM Memory: 100ns
- Read 1 MB SSD: 1,000,000ns or 1ms
- Read 1 MB HDD: 30,000,000ns or 30ms
Trade-offs:
- increased storage overhead (to store index)
- slower writes (since we not only have to write the data but also have to update the index)

Indexing Types

Types:
- Dense
- Sparse
Types:
- CLUSTERED
- NON CLUSTERED
Types:
- B-Tree
- Bitmap
- Hash
- Special purpose
https://www.atlassian.com/data/sql/how-indexing-works

MSSQL

There can be only one clustered index per table, because the data rows themselves can be stored in only one order.
- When a table has a clustered index, the table is called a clustered table.
- If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
SQL Server automatically creates indexes when PRIMARY KEY and UNIQUE constraints are defined on table columns.
- PRIMARY
  - create clustered index if it already does not exist
  - else create non-clustered index
- UNIQUE
  - create non-clustered index
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/indexes?view=sql-server-ver17
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver16
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-index-transact-sql?view=sql-server-ver17
https://stackoverflow.com/questions/4332982/do-clustered-indexes-have-to-be-unique

How it works

It is similar to Book indexes at the end of the book
- We have list of terms sorted alphabetically, so that the reader can find the interested term
- for each term there is page number written so that reader can directly jump to the page
Let’s assume following data in person table

id	first_name	last_name
1	Liam	Miller
2	Fiona	Watson
3	Gary	Beckham
4	Harry	Miller
5	Louise	Smith

Suppose we create DB index on a column

-- create clustered index
-- theoretically can be unique or not
-- in practice it is always unique
CREATE CLUSTERED INDEX idx_pname
ON person (last_name, first_name);
 
-- create non clustered index
CREATE INDEX idx_pname
ON person (last_name, first_name);
 
-- To create unique non-clustered index
-- duplicate values are not allowed
CREATE UNIQUE INDEX idx_pname  
ON person (last_name, first_name);

What happens is that internally, we will have a data structure, which will store list of rows sorted by last_name, and then first_name
Below is example index:

last_name	first_name	id
Beckham	Gary	3
Miller	Harry	4
Miller	Liam	1
Smith	Louise	5
Watson	Fiona	2

Now we perform the following query

SELECT id, first_name, last_name
FROM person
WHERE last_name = 'Miller'
AND first_name= 'Liam'

The above query causes lookup in index idx_pname, since everything is already sorted, we can perform binary search in index to find last_name as Miller and then perform binary search on first_name as Liam on the rows with Miller as last_name

Non-relevant columns in select query

https://dba.stackexchange.com/questions/187807/do-non-relevant-columns-affect-query-time-of-select-statements
SELECT * query can’t take advantage of index compared to SELECT COLUMN_NAMES

Is Non Unique Columns bad for indexing?

What if the column values are not unique which you are indexing, is it bad idea?

Geo-Spatial Index

Uber lyft — Key value store??

Experiments

Explorer

Indexing

Relational DB Indexing

Indexing Types

MSSQL

How it works

Non-relevant columns in select query

Is Non Unique Columns bad for indexing?

Geo-Spatial Index

Table of Contents