In order to increase speed of searching a record in the database, we usually create indice. Actually, indexing is a kind of data structure.
For finding a specific element in a table, the common way is to search one by one. However, its time complexity is O(n). A wiser way is to use Binary Search Tree whose time complexity is O(log(n)). Sounds great? Here is the thing, index files store in external memory so swithching from one block (node) to another block is time comsuming. The more nodes a tree has, the more time it takes to deal with I/O operation for the disk. Our goal is to reduce the height of the tree(or the number of nodes), so that B-trees and B+ trees are decent choices. It is because multiple keys are able to store in one node, so the height of the tree drop down rapidly.


B-tree and B+ tree

Both B-trees and B+ trees are a kind of balanced multi-way search tree.

b-tree & b+ tree

b-tree & b+ tree
Goodness for B+ trees

  • B-trees store both key and data in each node (interal node and leaves), while B+ trees store only data in leaf node. This allows more number of nodes(/keys) to contain in a given size of block, which means it takes less number of times on I/O operation for disk (less switching between two blocks). In most cases, we have to access data that is on a leaf node so it costs less time on average.

  • The leaf nodes of a B+ treee are linked. Thus, it is easy to access a range of data that store in different nodes (e.g. range (5,12)). Furthermore, doing a full scan is much easier, since it goes through as a linear linked list. On the other hand, a B-tree has to traverse the tree to do so.

Goodness for B-trees

  • Seeking a particular data (near to the root) frequently can obtain the data faster. (You don’t need to reach the leaf node!)

InnoDB and MyISAM

In MySQL, B+ trees are used in both InnoDB (default) and MyISAM storage engine.

MyISAM

MyISAM

  • For MyISAM, the physical address of each record is stored in the data field of each leaf node. It means that the index files and the data file are seperated. Here, col1 is the primary key and col2 is secondary key.
  • table-level locking
    InnoDB

InnoDB

  • Compared with MyISAM, 1) one big difference for InnoDB is that the data field in each leaf node stored the complete records rather than a physical address (clustered index). The key of the index tree(B+ tree) is the primary key of the table. Therefore, InnoDB requires a table must have primary. If there is no primary key, MySQL will chose from existing table or create automatically. 2) Another difference is that each data field of the secondary key stores the relative primary key rather than a physical address.
  • Since maintaing the B+ tree features (reblanceing) is time consuming so that is the reasons why we should use short incremental and numerical field as the primary key.
  • row-level locking

To sum up:
Frequent reading, almost no writing => MyISAM
Full-text search in MySQL <= 5.5 => MyISAM
In all other circumstances, InnoDB is usually the best way to go.


Scheme Optimization

should create index for following columns(fields):

  • primary key
  • foreign key (connect to other table constantly)
  • range searching
  • sort frequently
  • use where clause frequently

should not create index for following columns:

  • use rarely
  • low selectivity (low percentage of distinct data) (e.g. gender)

prefex indexing

  • It uses the prefex replace the whole column value as the index so it ensures the balance between selectivity and size of a field.

Reference:

http://stackoverflow.com/questions/870218/differences-between-b-trees-and-b-trees