"

Sample covariance matrix

Definition

For a vector z \in \mathbb{R}^m, the sample variance \sigma^2 measures the average deviation of its coefficients around the sample average \hat{x}:

    \begin{align*} \hat{z} &:= \frac{1}{m}(z(1)+\ldots+z(m)), \quad \sigma^2 := \frac{1}{m}\left((z(1)-\hat{z})^2+\ldots+(z(m)-\hat{z})^2\right), \end{align*}

Now consider a matrix X = [x_1, \cdots, x_m] \in \mathbb{R}^{n\times m}, where each column x_i represents a data point in \mathbb{R}^n. We are interested in describing the amount of variance in this data set. To this end, we look at the numbers we obtain by projecting the data along a line defined by the direction u \in \mathbb{R}^n. This corresponds to the vector in \mathbb{R}^m.

    \begin{align*} z &= \begin{pmatrix} u^Tx_1 \\ \vdots \\ u^T x_m \end{pmatrix} = X^T u \in \mathbb{R}^m. \end{align*}

The corresponding sample mean and variance are

    \begin{align*} \hat{z} &= u^T \hat{x}, \quad \sigma^2(u) := \frac{1}{m} \sum\limits_{k=1}^m (u^Tx_k - u^T \hat{x})^2, \end{align*}

where  \hat{x} := \displaystyle\frac{1}{m}(x_1 + \cdots + x_m) \in \mathbb{R}^n is the sample mean of the vectors [latex]x_1, \cdots, x_m[/latex].

The sample variance along direction u can be expressed as a quadratic form in u:

     \begin{align*} \sigma^2(u) &= \frac{1}{m} \sum_{k=1}^m [u^T(x_k-\hat{x})]^2 = u^T\Sigma u, \end{align*}

where \Sigma is a n \times n symmetric matrix, called the sample covariance matrix of the data points:

     \begin{align*} \Sigma &= \frac{1}{m} \sum_{k=1}^m (x_k-\hat{x})(x_k - \hat{x})^T. \end{align*}

Properties

The covariance matrix satisfies the following properties:

  • The sample covariance matrix allows finding the variance along any direction in data space.
  • The diagonal elements of \Sigma give the variances of each vector in the data.
  • The trace of \Sigma gives the sum of all the variances.
  • The matrix \Sigma is positive semi-definite, since the associated quadratic form u \rightarrow u^T\Sigma u is non-negative everywhere.

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Linear Algebra and Applications Copyright © 2023 by VinUiversity is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.