In
statistics, a
contingency table (also referred to as
cross tabulation or
cross tab) is often used to record and analyze the relation between two or more
categorical variables. It displays the (multivariate)
frequency distribution of the variables in a
matrix format.
The term
contingency table was first used by
Karl Pearson in "On the Theory of Contingency and Its Relation to Association and Normal Correlation", part of the
Drapers' Company Research Memoirs Biometric Series I published in 1904.
A crucial problem of multivariate statistics is finding (direct-)dependence structure underlying the variables contained in high dimensional contingency tables. If some of the
conditional independences are revealed, then even the storage of the data can be done in a smarter way (see Lauritzen (2002)). In order to do this one can use
information theory concepts, which gain the information only from the distribution of probability, which can be expressed easily from the contingency table by the relative frequencies.
Example
Suppose that we have two variables, sex (male or female) and
handedness (right- or left-handed). Further suppose that 100 individuals are randomly sampled from a very large population as part of a study of sex differences in handedness. A contingency table can be created to display the numbers of individuals who are male and right-handed, male and left-handed, female and right-handed, and female and left-handed. Such a contingency table is shown...
Read More