LEAST-SQUARES AND SVD

Alicia Y. Tsai; Authors: Laurent EI Ghaoui; Contributors: Phan Hoang, Tony Tin, Ha To Tam, Nguyen Van Kep, Le Dinh Nam, Juliette Decugis, Nguyen Quoc Cuong.; Giuseppe C. Calafiore

37 LEAST-SQUARES AND SVD

37.1. Set of solutions

The following theorem provides all the solutions (optimal set) of a least-squares problem.

Theorem: optimal set of ordinary least-squares

The optimal set of the OLS problem

$p^* = \min_x \|A x - y\|_2$

can be expressed as

$\mathbf{X}^{\text {opt }} = A^{\dagger} y + \mathbf{N}(A),$

where $A^{\dagger}$ is the pseudo-inverse of $A$ , and $A^{\dagger} y$ is the minimum-norm point in the optimal set. If $A$ is full column rank, the solution is unique, and equal to

$x^* = A^{\dagger} y = (A^T A)^{-1} A^T y.$

In general, the particular solution $A^{\dagger} y$ is the minimum-norm solution to the least-squares problem.

Proof.

37.2. Sensitivity analysis

We consider the situation where

$y + \delta y = A x,$

with

$A \in \mathbb{R}^{m \times n}$ the data matrix (known), with $A$ full column rank (hence $m \geq n$ ).
$y \in \mathbb{R}^m$ is the measurement (known).
$x \in \mathbb{R}^n$ is the vector to be estimated (unknown).
$\delta y \in \mathbb{R}^m$ is a measurement noise or error (unknown).

We can use OLS to provide an estimate $\hat{x}_{\text {OLS }}$ of $X$ . The idea is to seek the smallest vector $\delta y$ such that the above equation becomes feasible, that is,

$\min_{x, \delta y} \|\delta y\|_2 : y + \delta y = A x.$

This leads to the OLS problem:

$\min_x \|A x - y\|_2.$

Since $A$ is full column rank, its SVD can be expressed as

$A = U \begin{pmatrix} \Sigma \\ 0 \end{pmatrix} V^T,$

where $\Sigma=\mathbf{diag}\left(\sigma_1, \ldots, \sigma_m\right)$ contains the singular values of $A$ , with $\sigma_1 \geq \ldots \geq \sigma_m>0$ .

Since $A$ is full column rank, the solution $\hat{x}_{\mathrm{LS}}$ to the OLS problem is unique, and can be written as a linear function of the measurement vector $y$ :

$\hat{x}_{\mathrm{LS}} = A^{\dagger} y,$

with $A^{\dagger}$ the pseudo-inverse of $A$ . Again, since $A$ is full column rank,

$A^{\dagger} = (A^T A)^{-1} A^T = V \begin{pmatrix} \Sigma^{-1} & 0 \end{pmatrix} U^T.$

The OLS formulation provides an estimate $\hat{x}$ of the input $x$ such that the residual error vector $Ax-y$ is minimized in norm. We are interested in analyzing the impact of perturbations in the vector $y$ , on the resulting solution $\hat{x}_{LS}$ . We begin by analyzing the absolute errors in the estimate and then turn to the analysis of relative errors.

Set of possible errors

Let us assume a simple model of potential perturbations: we assume that $\delta y$ belongs to a unit ball: $||\delta y||_2 \leq \alpha$ , where $\alpha>0$ is given. We will assume $\alpha=1$ for simplicity; the analysis is easily extended to any $\alpha>0$ .

We have

$\begin{aligned} \delta x &:= x - \hat{x} \\ &= x - A^{\dagger} y \\ &= x - A^{\dagger}(A x - \delta y) \\ &= (I - A^{\dagger} A) x + A^{\dagger} \delta y \\ &= A^{\dagger} \delta y. \end{aligned}$

In the above, we have exploited the fact that $A^{\dagger}$ is a left inverse of $A$ , that is, $A^{\dagger} A=I_n$ .

The set of possible errors on the solution $\delta x$ is then given by

$\mathbf{E} = \left\{ A^{\dagger} \delta y : \|\delta y\|_2 \leq 1 \right\},$

which is an ellipsoid centered at zero, with principal axes given by the singular values of $A^{\dagger}$ . This ellipsoid can be interpreted as an ellipsoid of confidence for the estimate $\hat{x}$ , with size and shape determined by the matrix $A^{\dagger}(A^{\dagger})^T$ .

We can draw several conclusions from this analysis:

The largest absolute error in the solution that can result from a unit-norm, additive perturbation on $y$ is of the order of $1 / \sigma_m$ , where $\sigma_m$ is the smallest singular value of $A$ .
The largest relative error is $\sigma_1 / \sigma_m$ , the condition number of $A$ .

37.3. BLUE property

We now return to the case of an OLS with full column rank matrix $A$ .

Unbiased linear estimators

Consider the family of linear estimators, which are of the form

$\hat{x} = B y,$

where $B \in \mathbb{R}^{n \times m}$ . To this estimator, we associate the error

$\begin{aligned} \delta x &= x - \hat{x} \\ &= x - B y \\ &= x - B(A x - \delta y) \\ &= (I - B A) x + B \delta y. \end{aligned}$

We say that the estimator (as determined by matrix $B$ ) is unbiased if the first term is zero:

$B A = I.$

Unbiased estimators only exist when the above equation is feasible, that is, $A$ has a left inverse. This is equivalent to our condition that $A$ be full column rank. Since $A^{\dagger}$ is a left-inverse of $A$ , the OLS estimator is a particular case of an unbiased linear estimator.

Best unbiased linear estimator

The above analysis leads to the following question: which is the best unbiased linear estimator? One way to formulate this problem is to assume that the perturbation vector $\delta y$ is bounded in some way, and try to minimize the possible impact of such bounded errors on the solution.

Let us assume that $\delta y$ belongs to a unit ball: $\|\delta y\|_2 \leq 1$ . The set of resulting errors on the solution $\delta x$ is then

$\mathbf{E} = \left\{ B \delta y : \|\delta y\|_2 \leq 1 \right\},$

which is an ellipsoid centered at zero, with principal axes given by the singular values of $B$ . This ellipsoid can be interpreted as an ellipsoid of confidence for the estimate $\hat{x}$ , with size and shape determined by the matrix $BB^T$ .

It can be shown that the OLS estimator is optimal in the sense that it provides the ‘‘smallest’’ ellipsoid of confidence among all unbiased linear estimators. Specifically:

$B B^T \succeq A^{\dagger} (A^{\dagger})^T.$

This optimality of the LS estimator is referred to as the BLUE (Best Linear Unbiased Estimator) property.

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License