Stackoverflow doesn’t have very good answers about this topic since it’s too broad.
This article is helpful: http://www.programmerinterview.com/index.php/database-sql/what-is-an-index/
- Basically, without an index, if you’d like to find one certain row(s), you’ll have to do a full table scan, this is literally really inefficient, like human eye check
- An index is a data structure (most commonly a B-tree) that stores the values of a specific column in the table, i.e. in index consists of column values from the table
- Why B-tree?
- This is due to their efficiency: insertion/deletion/lookup can all be done in O(logn) time.
- Another major reason is that the values stored in B-tree can be sorted
- How does Hash table indexes work?
- Imagine that we store one column value as the key in the hashtable, and the row data as the value of the hashtable, the lookup time is very fast. Because basically, Hashtable is like an “associative array”.
- The disadvantages of Hashtable working as index:
- Hashtable keys are not sorted, it only maintains a mapping between key and value, so it’s good for fast lookup, but it’s not good for queries such to find how many employees that are younger than 25 years old in a table.
- Good analogy of database index:
- It’s like the index of a book: if you’d like to find the Chapter describing Python decorators, you could either flip through the pages or just go to the index page where that Chapter is listed, also a page number of that chapter is listed as well. Apparently, using the index page is a lot faster.
- What’s the cost of having a database index?
- It takes up space, the larger your table is, the larger your index is.
- Another performance hit is that whenever you do CRUD to your table, the same operations will have to be done to your index
- As a general rule, an index should only be created, if the data on the indexed column will be queried frequently.