

6.1. Linear and affine functions


Linear functions are functions which preserve scaling and addition of the input argument. Affine functions are ‘‘linear plus constant’’ functions.

Formal definition, linear and affine functions. A function f: \mathbb{R}^n \rightarrow \mathbb{R} is linear if and only if f preserves scaling and addition of its arguments:

  • for every x \in \mathbb{R}^n, and \alpha \in \mathbb{R}, f(\alpha x) = \alpha f(x); and
  • for every x_1, x_2 \in \mathbb{R}^n, f(x_1 + x_2) = f(x_1) + f(x_2).

A function f is affine if and only if the function \tilde{f}: \mathbb{R}^n \rightarrow \mathbb{R} with values \tilde{f}(x) = f(x) - f(0) is linear.

An alternative characterization of linear functions:


A function f : \mathbb{R}^n \rightarrow \mathbb{R}^m is linear if and only if either one of the following conditions hold.

1. f preserves scaling and addition of its arguments:

– For every x in \mathbb{R}^n, and \alpha in \mathbb{R}, f(\alpha x) = \alpha f(x); and

– For every x_1,x_2 in \mathbb{R}^n, f(x_1+x_2) = f(x_1)+f(x_2).

2. f vanishes at the origin: f(0) = 0, and transforms any line segment in \mathbb{R}^n into another segment in \mathbb{R}^m:

    \begin{align*}f(\lambda x + (1-\lambda) y) = \lambda f(x) + (1-\lambda) f(y), \quad \quad \forall x, y \in \mathbb{R}^n, \forall \lambda \in [0,1].\end{align*}

3. f is differentiable, vanishes at the origin, and the matrix of its derivatives is constant: there exist A in \mathbb{R}^{m \times n} such that

    \begin{align*}f(x) = Ax, \quad \quad \forall x \in \mathbb{R}^n.\end{align*}

Example 1: Consider the functions f_1, f_2, f_3: \mathbb{R}^2 \rightarrow \mathbb{R} with values

    \begin{align*} f_1(x) = 3.2 x_1 + 2 x_2 \end{align*}

    \begin{align*} f_2(x) = 3.2 x_1 + 2x_2 + 0.15 \end{align*}

    \begin{align*} f_3(x) = 0.001 x_2^2 +2.3 x_1 + 0.3 x_2 \end{align*}

The function f_1 is linear; f_2 is affine; and f_3 is neither.

Connection with vectors via the scalar product

The following shows the connection between linear functions and scalar products.

Theorem: Representation of affine function via the scalar product

A function f: \mathbb{R}^n \rightarrow \mathbb{R} is affine if and only if it can be expressed via a scalar product:

    \begin{align*} f(x) = a^Tx+b, \end{align*}

for some unique pair (a,b), with a \in \mathbb{R}^n and b \in \mathbb{R}, given by a_i = f(e_i), with e_i the i-th unit vector in \mathbb{R}^n, i = 1, \cdots, n, and b = f(0). The function is linear if and only if b = 0.

The theorem shows that a vector can be seen as a (linear) function from the ‘‘input“ space \mathbb{R}^n to the ‘‘output” space \mathbb{R}. Both points of view (matrices as simple collections of numbers, or as linear functions) are useful.

Gradient of an affine function

The gradient of a function f: \mathbb{R}^n \rightarrow \mathbb{R} at a point x, denoted \nabla f(x), is the vector of first derivatives with respect to x_1, \cdots, x_n (see here for a formal definition and examples). When n=1 (there is only one input variable), the gradient is simply the derivative.

An affine function f: \mathbb{R}^n \rightarrow \mathbb{R}, with values f(x) = a^Tx+b has a very simple gradient: the constant vector a. That is, for an affine function f, we have for every x:

    \begin{align*} \nabla f(x) = a \end{align*}

Example 2: gradient of a linear function:
Consider the function f : \mathbb{R}^2 \rightarrow \mathbb{R}, with values f(x) = x_1 + 2x_2. Its gradient is constant, with values

    \begin{align*} \nabla f = \left(\begin{array}{c} \dfrac{\partial f}{\partial x_1}(x) \\[1em] \dfrac{\partial f}{\partial x_2}(x) \end{array} \right) = \left(\begin{array}{c} 1 \\ 2 \end{array} \right). \end{align*}

For a given t in \mathbb{R}, the t-level set is the set of points such that f(x) = t:

    \begin{align*} \mathbf{L}_t(f) := \{(x_1, x_2) ~:~ x_1 + 2 x_2 = t \}. \end{align*}

The level sets are hyperplanes, and are orthogonal to the gradient.


The interpretation of a,b are as follows.

  • The b = f(0) is the constant term. For this reason, it is sometimes referred to as the bias, or intercept (as it is the point where f intercepts the vertical axis if we were to plot the graph of the function).
  • The terms a_j, j = 1, \cdots, n, which correspond to the gradient of f, give the coefficients of influence of x_j on f. For example, if a_1 \gg a_3, then the first component of x has much greater influence on the value of f(x) than the third.

See also: Beer-Lambert law in absorption spectrometry.

6.2. First-order approximation of non-linear functions

Many functions are non-linear. A common engineering practice is to approximate a given non-linear map with a linear (or affine) one, by taking derivatives. This is the main reason for linearity to be such an ubiquituous tool in Engineering.

One-dimensional case

Consider a function of one variable f: \mathbb{R} \rightarrow \mathbb{R}, and assume it is differentiable everywhere. Then we can approximate the values function at a point x near a point x_{0} as follows:

    \begin{align*} f(x) \simeq l(x):= f(x_0) + f'(x_0)(x-x_0), \end{align*}

where f'(x) denotes the derivative of f at x.

Multi-dimensional case

With more than one variable, we have a similar result. Let us approximate a differentiable function f: \mathbb{R}^n \rightarrow \mathbb{R} by a linear function l, so that f and l coincide up and including to the first derivatives. The corresponding approximation l is called the first-order approximation to f at x_0.

The approximate function l must be of the form

    \begin{align*} l(x) = a^Tx+b, \end{align*}

where a \in \mathbb{R}^n and b \in \mathbb{R}. Our condition that l coincides with f up and including to the first derivatives shows that we must have

    \begin{align*} \nabla l(x) = a =\nabla f(x_0), a^Tx_0 + b = f(x_0), \end{align*}

where \nabla f(x_0) the gradient of f at x_0. Solving for a,b we obtain the following result:

Theorem: First-order expansion of a function.

The first-order approximation of a differentiable function f at a point x_0 is of the form

    \begin{align*} f(x) \approx l(x) = f(x_0) + \nabla f(x_0)^T(x-x_0), \end{align*}

where \nabla f(x_0) \in \mathbb{R}^n is the gradient of f at x_0.

Example 3: a linear approximation to a non-linear function
Consider the log-sum-exp function

    \begin{align*} f(x) = \log(e^{x_1}+e^{x_2}) \end{align*}

admits the gradient at the point x^0 given by

    \begin{align*} \nabla f(x_0) = \frac{1}{e^{x^0_1}+e^{x^0_2}} \left(\begin{array}{c} e^{x_1^0} \\ e^{x_2^0} \end{array}\right) . \end{align*}

Hence f can be approximated near x^0 by the linear function

    \begin{align*} f(x) \approx \log(e^{x_1}+e^{x_2}) + \frac{1}{e^{x^0_1}+e^{x^0_2}} \left( (x_1-x_1^0) e^{x^0_1} +(x_2-x^0_2) e^{x^0_2}\right) . \end{align*}

6.3. Other sources of linear models

Linearity can arise from a simple change of variables. This is best illustrated with a specific example.

ExamplePower laws.


Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Linear Algebra and Applications Copyright © 2023 by VinUiversity is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.