"

15 MATRIX NORMS

15.1 Motivating example: effect of noise in a linear system

We saw how a matrix (say, A \in \mathbb{R}^{m \times n}) induces, via the matrix-vector product, a linear map x \rightarrow Ax. Here, x is an input vector and y=Ax is the output. The mapping (that is, A) could represent a linear amplifier with the input of an audio signal x and output another audio signal y.

Now, assume that there is some noise in the vector x: the actual input is x+v, where v \in \mathbb{R}^{n} is an error vector. This implies that there will be noise in the output as well: the noisy output is A(x+v) so the error on the output due to noise is Av. How could we quantify the effect of input noise on the output noise?

One approach is to try to measure the norm of the error vector, ||Av||. Obviously, this norm depends on the noise v, which we do not know. So we will assume that v can take values in a set. We need to come up with a single number that captures in some way the different values of ||Av|| when v spans that set. Since scaling v simply scales the norm ||Av|| accordingly, we will restrict the vectors v to have a certain norm, say ||v||=1.

Clearly, depending on the choice of the set, the norms we use to measure norm lengths, and how we choose to capture many numbers ||Av|| with one, etc, we will obtain different numbers.

15.2 RMS gain: the Frobenius norm

Let us first assume that the noise vector v can take a finite set of directions, specifically the directions represented by the standard basis, e_{1},\dots,e_{n}. Then let us look at the average of the squared error norm:

    \[\dfrac{1}{n}\sum_{i=1}^{n}||Ae_{i}||_{2}^{2}=\dfrac{1}{n} \sum_{i=1}^{n}||a_{i}||_{2}^{2},\]

where a_{i} stands for the i-th column of A. The quantity above can be written as  \dfrac{1}{n} ||A||_{F}^{2}, where

    \[||A||_{F} := \sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}A_{ij}^{2}}=\sqrt{Tr(A^{T}A)\]

is the Frobenius norm of A.

The function A \rightarrow ||A||_{F} turns out to satisfy the basic conditions of a norm in the matrix space \mathbb{R}^{m \times n}. In fact, it is the Euclidean norm of the vector of length nm formed with all the coefficients of A. Further, the quantity would remain the same if we had chosen any orthonormal basis other than the standard one.

The Frobenius norm is useful to measure the RMS (root-mean-square) gain of the matrix, and its average response along given mutually orthogonal directions in space. Clearly, this approach does not capture well the variance of the error, only the average effect of noise.

The computation of the Frobenius norm is very easy: it requires about nm flops.

15.3 Peak gain: the largest singular value norm

To try to capture the variance of the output noise, we may take a worst-case approach.

Let us assume that the noise vector is bounded but otherwise unknown. Specifically, all we know about v is that ||v||_{2} \le \alpha, where \alpha is the maximum amount of noise (measured in Euclidean norm). What is then the worst-case (peak) value of the norm of the output noise? This is answered by the optimization problem

    \[\max_{v}||Av||_{2}: ||v||_{2} \le \alpha.\]

The quantity

    \[||A||_{LSV} := \max_{v}||Av||_{2}: ||v||_{2} \le 1\]

measures the peak gain of the mapping A, in the sense that if the noise vector is bounded in norm by \alpha, then the output noise is bounded in norm by \alpha||A||. Any vector v which achieves the maximum above corresponds to a direction in input space that is maximally amplified by the mapping A.

The quantity ||A||_{LSV} is indeed a matrix norm, called the largest singular value (LSV) norm, for reasons seen here. It is perhaps the most popular matrix norm.

The computation of the largest singular value norm of a matrix is not as easy as with the Frobenius norm. However, it can be computed with linear algebra methods seen here, in about \min(n,m)nm flops. While it is more expensive to compute than the Frobenious norm, it is also more useful because it goes beyond capturing the average response to noise.

Other norms

Many other matrix norms are possible, and sometimes useful. In particular, we can generalize the notion of peak norm by using different norms to measure vector size in the input and output spaces. For example, the quantity

    \[||A||_{\infty,1} := \max_{v}||Av||_{1}: ||v||_{\infty} \le 1\]

measures the peak gain with inputs bounded in the maximum norm, and outputs measured with the l_{1}-norm.

The norms we have just introduced, the Frobenius and largest singular value norms, are the most popular ones and are easy to compute. Many other norms are hard to compute.

15.4 Applications

Distance between matrices

Matrix norms are ways to measure the size of a matrix. This allows quantifying the difference between matrices.

Assume for example that we are trying to estimate a matrix A, and came up with an estimate \hat{A}. How can we measure the quality of our estimate? One way is to evaluate by how much they differ when they act on a standard basis. This leads to the Frobenius norm.

Another way is to look at the difference in the output:

    \[||Av - \hat{A}v||_{2}\]

when v runs the whole space. Clearly, we need to scale or limit the size, of v, otherwise, the difference above may be arbitrarily big. Let’s look at the worst-case difference when v satisfies ||v||_{2} \le 1. We obtain

    \[\max_{v}||Av-\hat{A}v||_{2}: ||v||_{2} \le 1,\]

which is the largest singular value norm of the difference A - \hat{A}..

The direction of maximal variance

Consider a data set described as a collection of vectors a_{1},\dots,a_{n}, with a_{i} \in \mathbb{R}^{m}. We can gather this data set in a single matrix A = [a_{1},\dots,a_{n}] \in \mathbb{R}^{m \times n}. For simplicity, let us assume that the average vector is zero:

    \[\hat{a} := \dfrac{1}{n}\sum_{i=1}^{n}a_{i}=0.\]

Let us try to visualize the data set by projecting it on a single line passing through the origin. The line is thus defined by a vector x \in \mathbb{R}^{m}, which we can without loss of generality assume to be of Euclidean norm 1. The data points, when projected on the line, are turned into real numbers x^{T}a_{i}, i=1,\dots,n..

It can be argued that a good line to project data on is one which spreads the numbers x^{T}a_{i} as much as possible. (If all the data points are projected to numbers that are very close, we will not see anything, as all data points will collapse to close locations.)

We can find a direction in space that accomplishes this, as follows. The average of the numbers is

    \[\dfrac{1}{n}\sum_{i=1}^{n}a_{i}^{T}x= \left(\dfrac{1}{n}\sum_{i=1}^{n}a_{i}\right)^{T}x = \hat{a}^{T}x=0,\]

while their variance is

    \[\dfrac{1}{n}\sum_{i=1}^{n}(a_{i}^{T}x-\hat{a}^{T}x)^{2}= \dfrac{1}{n}\sum_{i=1}^{n}(a_{i}^{T}x)^{2} = \dfrac{1}{n}x^{T}AA^{T}x = \dfrac{1}{n}||A^{T}x||_{2}^{2}.\]

The direction of maximal variance is found by computing the LSV norm of A^{T}

    \[\max_{v} \left\{||A^{T}v||_{2} : ||v||_{2} \le 1\right\} = ||A^{T}||_{LSV}.\]

(It turns out that this quantity is the same as the LSV norm of A itself.)

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Linear Algebra and Applications Copyright © 2023 by VinUiversity is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.