LINEAR FUNCTIONS

Alicia Y. Tsai; Authors: Laurent EI Ghaoui; Contributors: Phan Hoang, Tony Tin, Ha To Tam, Nguyen Van Kep, Le Dinh Nam, Juliette Decugis, Nguyen Quoc Cuong.; Giuseppe C. Calafiore

6 LINEAR FUNCTIONS

6.1. Linear and affine functions

Definition

Linear functions are functions which preserve scaling and addition of the input argument. Affine functions are ‘‘linear plus constant’’ functions.

Formal definition, linear and affine functions. A function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is linear if and only if $f$ preserves scaling and addition of its arguments:

for every $x \in \mathbb{R}^n$ , and $\alpha \in \mathbb{R}$ , $f(\alpha x) = \alpha f(x)$ ; and
for every $x_1, x_2 \in \mathbb{R}^n$ , $f(x_1 + x_2) = f(x_1) + f(x_2)$ .

A function $f$ is affine if and only if the function $\tilde{f}: \mathbb{R}^n \rightarrow \mathbb{R}$ with values $\tilde{f}(x) = f(x) - f(0)$ is linear.

An alternative characterization of linear functions:

A function $f : \mathbb{R}^n \rightarrow \mathbb{R}^m$ is linear if and only if either one of the following conditions hold.

1. $f$ preserves scaling and addition of its arguments:

– For every $x$ in $\mathbb{R}^n$ , and $\alpha$ in $\mathbb{R}$ , $f(\alpha x) = \alpha f(x)$ ; and

– For every $x_1,x_2$ in $\mathbb{R}^n$ , $f(x_1+x_2) = f(x_1)+f(x_2)$ .

2. $f$ vanishes at the origin: $f(0) = 0$ , and transforms any line segment in $\mathbb{R}^n$ into another segment in $\mathbb{R}^m$ :

$\begin{align*}f(\lambda x + (1-\lambda) y) = \lambda f(x) + (1-\lambda) f(y), \quad \quad \forall x, y \in \mathbb{R}^n, \forall \lambda \in [0,1].\end{align*}$

3. $f$ is differentiable, vanishes at the origin, and the matrix of its derivatives is constant: there exist $A$ in $\mathbb{R}^{m \times n}$ such that

$\begin{align*}f(x) = Ax, \quad \quad \forall x \in \mathbb{R}^n.\end{align*}$

Example 1: Consider the functions $f_1, f_2, f_3: \mathbb{R}^2 \rightarrow \mathbb{R}$ with values

$\begin{align*} f_1(x) = 3.2 x_1 + 2 x_2 \end{align*}$

$\begin{align*} f_2(x) = 3.2 x_1 + 2x_2 + 0.15 \end{align*}$

$\begin{align*} f_3(x) = 0.001 x_2^2 +2.3 x_1 + 0.3 x_2 \end{align*}$

The function $f_1$ is linear; $f_2$ is affine; and $f_3$ is neither.

Connection with vectors via the scalar product

The following shows the connection between linear functions and scalar products.

Theorem: Representation of affine function via the scalar product

A function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ is affine if and only if it can be expressed via a scalar product:

$\begin{align*} f(x) = a^Tx+b, \end{align*}$

for some unique pair $(a,b)$ , with $a \in \mathbb{R}^n$ and $b \in \mathbb{R}$ , given by $a_i = f(e_i)$ , with $e_i$ the $i$ -th unit vector in $\mathbb{R}^n$ , $i = 1, \cdots, n$ , and $b = f(0)$ . The function is linear if and only if $b = 0$ .

The theorem shows that a vector can be seen as a (linear) function from the ‘‘input“ space $\mathbb{R}^n$ to the ‘‘output” space $\mathbb{R}$ . Both points of view (matrices as simple collections of numbers, or as linear functions) are useful.

Gradient of an affine function

The gradient of a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ at a point $x$ , denoted $\nabla f(x)$ , is the vector of first derivatives with respect to $x_1, \cdots, x_n$ (see here for a formal definition and examples). When $n=1$ (there is only one input variable), the gradient is simply the derivative.

An affine function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ , with values $f(x) = a^Tx+b$ has a very simple gradient: the constant vector $a$ . That is, for an affine function $f$ , we have for every $x$ :

$\begin{align*} \nabla f(x) = a \end{align*}$

Example 2: gradient of a linear function:

Consider the function $f : \mathbb{R}^2 \rightarrow \mathbb{R}$ , with values $f(x) = x_1 + 2x_2$ . Its gradient is constant, with values

$\begin{align*} \nabla f = \left(\begin{array}{c} \dfrac{\partial f}{\partial x_1}(x) \\[1em] \dfrac{\partial f}{\partial x_2}(x) \end{array} \right) = \left(\begin{array}{c} 1 \\ 2 \end{array} \right). \end{align*}$

For a given $t$ in $\mathbb{R}$ , the $t$ -level set is the set of points such that $f(x) = t$ :

$\begin{align*} \mathbf{L}_t(f) := \{(x_1, x_2) ~:~ x_1 + 2 x_2 = t \}. \end{align*}$

The level sets are hyperplanes, and are orthogonal to the gradient.

Interpretations

The interpretation of $a,b$ are as follows.

The $b = f(0)$ is the constant term. For this reason, it is sometimes referred to as the bias, or intercept (as it is the point where $f$ intercepts the vertical axis if we were to plot the graph of the function).
The terms $a_j$ , $j = 1, \cdots, n$ , which correspond to the gradient of $f$ , give the coefficients of influence of $x_j$ on $f$ . For example, if $a_1 \gg a_3$ , then the first component of $x$ has much greater influence on the value of $f(x)$ than the third.

6.2. First-order approximation of non-linear functions

Many functions are non-linear. A common engineering practice is to approximate a given non-linear map with a linear (or affine) one, by taking derivatives. This is the main reason for linearity to be such an ubiquituous tool in Engineering.

One-dimensional case

Consider a function of one variable $f: \mathbb{R} \rightarrow \mathbb{R}$ , and assume it is differentiable everywhere. Then we can approximate the values function at a point $x$ near a point $x_{0}$ as follows:

$\begin{align*} f(x) \simeq l(x):= f(x_0) + f'(x_0)(x-x_0), \end{align*}$

where $f'(x)$ denotes the derivative of $f$ at $x$ .

Multi-dimensional case

With more than one variable, we have a similar result. Let us approximate a differentiable function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ by a linear function $l$ , so that $f$ and $l$ coincide up and including to the first derivatives. The corresponding approximation $l$ is called the first-order approximation to $f$ at $x_0$ .

The approximate function $l$ must be of the form

$\begin{align*} l(x) = a^Tx+b, \end{align*}$

where $a \in \mathbb{R}^n$ and $b \in \mathbb{R}$ . Our condition that $l$ coincides with $f$ up and including to the first derivatives shows that we must have

$\begin{align*} \nabla l(x) = a =\nabla f(x_0), a^Tx_0 + b = f(x_0), \end{align*}$

where $\nabla f(x_0)$ the gradient of $f$ at $x_0$ . Solving for $a,b$ we obtain the following result:

Theorem: First-order expansion of a function.

The first-order approximation of a differentiable function $f$ at a point $x_0$ is of the form

$\begin{align*} f(x) \approx l(x) = f(x_0) + \nabla f(x_0)^T(x-x_0), \end{align*}$

where $\nabla f(x_0) \in \mathbb{R}^n$ is the gradient of $f$ at $x_0$ .

Example 3: a linear approximation to a non-linear function

Consider the log-sum-exp function

$\begin{align*} f(x) = \log(e^{x_1}+e^{x_2}) \end{align*}$

admits the gradient at the point $x^0$ given by

$\begin{align*} \nabla f(x_0) = \frac{1}{e^{x^0_1}+e^{x^0_2}} \left(\begin{array}{c} e^{x_1^0} \\ e^{x_2^0} \end{array}\right) . \end{align*}$

Hence $f$ can be approximated near $x^0$ by the linear function

$\begin{align*} f(x) \approx \log(e^{x_1}+e^{x_2}) + \frac{1}{e^{x^0_1}+e^{x^0_2}} \left( (x_1-x_1^0) e^{x^0_1} +(x_2-x^0_2) e^{x^0_2}\right) . \end{align*}$

6.3. Other sources of linear models

Linearity can arise from a simple change of variables. This is best illustrated with a specific example.

Example: Power laws.

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License