Cardinality

The terms “high cardinality” and “low cardinality” are frequently used when discussing table columns. A high cardinality column means that a column has many unique values. For example, a primary key column is a high cardinality column since each value is, by definition, unique. A low cardinality column is the opposite. For example, a column representing sex only has two possible values (Male and Female).

 

High cardinality columns are often great candidates to be the first (and sometimes only) column in an index. This is due to the fact that the index will only request a small number of rows from the table.

 

Low cardinality columns are more difficult to index correctly. For example, consider the following query:

select * from employee where sex=’F’;

Without an index, this query will perform a full table scan. >This seem undesirable. However, adding an index will reduce the number of rows retrieved by the table, but may actually take much longer due to the index usage. In this case, a normal b-tree index is probably a bad idea. New features offered by database vendors (such as Oracle’s bitmap and key compressed indexes) might be better solutions for low cardinality columns.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *