THE SVD THEOREM

Alicia Y. Tsai; Authors: Laurent EI Ghaoui; Contributors: Phan Hoang, Tony Tin, Ha To Tam, Nguyen Van Kep, Le Dinh Nam, Juliette Decugis, Nguyen Quoc Cuong.; Giuseppe C. Calafiore

34 THE SVD THEOREM

34.1. The SVD theorem

Basic idea

Recall from here that any matrix $A \in \mathbb{R}^{m \times n}$ with rank one can be written as

$A = \sigma u v^{T},$

where $u \in \mathbb{R}^{m}, v \in \mathbb{R}^{n}$ , and $\sigma > 0$ .

It turns out that a similar result holds for matrices of arbitrary rank $r$ . That is, we can express any matrix $A \in \mathbb{R}^{m \times n}$ of rank $r$ as sum of rank-one matrices

$A = \sum_{i=1}^{r}\sigma_{i} u_{i} v_{i}^{T},$

where $u_{1},\dots, u_{r}$ are mutually orthogonal, $v_{1},\dots,v_{r}$ are also mutually orthogonal, and the $\sigma_{i}$ ‘s are positive numbers called the singular values of $A$ . In the above, $r$ turns out to be the rank of $A$ .

Theorem statement

The following important result applies to any matrix $A$ , and allows us to understand the structure of the mapping $x \rightarrow Ax$ .

Theorem: Singular Value Decomposition (SVD)

An arbitrary matrix $A \in \mathbb{R}^{m \times n}$ admits a decomposition of the form

$A = \sum_{i=1}^{n}\sigma_{i} u_{i} v_{i}^{T} = U\tilde{S}V^{T}, \quad \tilde{S} := \begin{pmatrix} S & 0\\ 0 & 0 \end{pmatrix},$

where $U \in \mathbb{R}^{m \times m}, V \in \mathbb{R}^{n \times n}$ are both orthogonal matrices, and the matrix $S$ is diagonal:

$S = \mathbf{diag}(\sigma_{1},\dots,\sigma_{r}),$

where the positive numbers $\sigma_{1} \ge \dots \ge \sigma_{r} > 0$ are unique and are called the singular values of $A$ . The number $r \le \min(m, n)$ is equal to the rank of $A$ , and the triplet $(U, \tilde{S}, V)$ is called a singular value decomposition (SVD) of $A$ . The first $r$ columns of $U$ : $u_{i}, i=1,\dots,r$ (resp. $V$ : $v_{i}, i=1,\dots,r$ ) are called left (resp. right) singular vectors of $A$ , and satisfy

$Av_{i}=\sigma_{i}u_{i}, \quad u_{i}^{T}A=\sigma_{i}v_{i}, \quad i =1, \dots, r.$

This proof of the theorem hinges on the spectral theorem for symmetric matrices. Note that in the theorem, the zeros appearing alongside $S$ represents blocks of zeros. They may be empty, for example if $r=n,$ then there are no zeros to the right of $S$ .

Computing the SVD

The SVD of an $m \times n$ matrix $A$ can be computed via a sequence of linear transformations. The computational complexity of the algorithm, when expressed in terms of the number of floating-point operations, is given by

$O(nm\min(n,m)).$

This complexity can become substantial when dealing with large, dense matrices. However, for sparse matrices, one can expedite the computation if only the largest few singular values and their corresponding singular vectors are of interest. To understand the derivation of this complexity:

The outer product of vectors $u$ and $v^T$ has a complexity of $O(nm)$ . This is because for a vector $u$ of length $m$ and a vector $v$ of length $n$ , the outer product results in an $m \times n$ matrix, and computing each entry requires one multiplication.
The matrix $A$ has at most $r$ non-zero singular values, where $r \leq \min(n,m)$ . Each of these singular values will contribute to the overall computational cost.
Combining the costs from the two previous steps, the total computational complexity becomes $O(\min(n,m) \times nm)$ .

Example: A $4 \times 5$ example.

34.2. Geometry

The theorem allows to decompose the action of $A$ on a given input vector as a three-step process. To get $Ax$ , where $x \in \mathbb{R}^{n}$ , we first form $\tilde{x} := V^{T}x \in \mathbb{R}^{n}$ . Since $V$ is an orthogonal matrix, $V^{T}$ is also orthogonal, and $\tilde{x}$ is just a rotated version of $x$ , which still lies in the input space. Then we act on the rotated vector $\tilde{x}$ by scaling its elements. Precisely, the first $r$ elements of $\tilde{x}$ are scaled by the singular values $\sigma_{1},\dots,\sigma_{r}$ ; the remaining $n-r$ elements are set to zero. This step results in a new vector $\tilde{y}$ which now belongs to the output space $\mathbb{R}^{m}$ . The final step consists in rotating the vector $\tilde{y}$ by the orthogonal matrix $U$ , which results in $y= U\tilde{y}=Ax$ .

For example, assume $A$ has the simple form

$A = \begin{pmatrix} 1.3 & 0 \\ 0 & 2.1 \\ 0 & 0 \end{pmatrix},$

then for an input vector $x$ in $\mathbb{R}^{2}$ , $Ax$ is a vector in $\mathbb{R}^{3}$ with first component $1.3x_{1}$ , second component $2.1x_{2}$ , and last component being zero.

To summarize, the SVD theorem states that any matrix-vector multiplication can be decomposed as a sequence of three elementary transformations: a rotation in the input space, a scaling that goes from the input space to the output space, and a rotation in the output space. In contrast with symmetric matrices, input and output directions are different.

The interpretation allows to make a few statements about the matrix.

Example: A $4 \times 4$ example.

34.3. Link with the SED (Spectral Theorem)

If $A$ admits an SVD, then the matrices $AA^{T}$ and $A^{T}A$ has the following SEDs:

$A A^T=U \Lambda_m U^T, \quad A^T A=V \Lambda_n V^T,$

where

$\Lambda_m := \tilde{S}\tilde{S}^{T} = \mathbf{diag}(\sigma_{1}^2,\dots,\sigma_{r}^2,0,\dots,0)$

is $m \times m$ (so it has $m-r$ trailing zeros), and

$\Lambda_n := \tilde{S}^{T}\tilde{S} = \mathbf{diag}(\sigma_{1}^2,\dots,\sigma_{r}^2,0,\dots,0)$

is $n \times n$ (so it has $n-r$ trailing zeros). The eigenvalues of $AA^{T}$ and $A^{T}A$ are the same, and equal to the squared singular values of $A$ .

The corresponding eigenvectors are the left and right singular vectors of $A$ .

This is a method (not the most computationally efficient) to find the SVD of a matrix, based on the SED.

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License