Sample covariance matrix
Definition
For a vector , the sample variance measures the average deviation of its coefficients around the sample average :
Now consider a matrix , where each column represents a data point in . We are interested in describing the amount of variance in this data set. To this end, we look at the numbers we obtain by projecting the data along a line defined by the direction . This corresponds to the vector in .
The corresponding sample mean and variance are
where is the sample mean of the vectors [latex]x_1, \cdots, x_m[/latex].
The sample variance along direction can be expressed as a quadratic form in :
where is a symmetric matrix, called the sample covariance matrix of the data points:
Properties
The covariance matrix satisfies the following properties:
- The sample covariance matrix allows finding the variance along any direction in data space.
- The diagonal elements of give the variances of each vector in the data.
- The trace of gives the sum of all the variances.
- The matrix is positive semi-definite, since the associated quadratic form is non-negative everywhere.