Gradient of a function

Alicia Y. Tsai; Authors: Laurent EI Ghaoui; Contributors: Phan Hoang, Tony Tin, Ha To Tam, Nguyen Van Kep, Le Dinh Nam, Juliette Decugis, Nguyen Quoc Cuong.; Giuseppe C. Calafiore

Gradient of a function

The gradient of a differentiable function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ contains the first derivatives of the function with respect to each variable. The gradient is useful to find the linear approximation of the function near a point.

Definition

The gradient of $f$ at $x_0$ , denoted $\nabla f(x_0)$ , is the vector in $\mathbb{R}^n$ given by

$\nabla f\left(x_0\right) = \left(\begin{array}{c} \dfrac{\partial f}{\partial x_1}(x) \\[0.5em] \vdots \\[0.5em] \dfrac{\partial f}{\partial x_n}(x) \end{array}\right).$

Examples:

Distance function: The distance function from a point $p \in \mathbb{R}^2$ to another point $x \in \mathbb{R}^2$ is defined as

$\rho(x)=\|x-p\|_2=\sqrt{\left(x_1-p_1\right)^2+\left(x_2-p_2\right)^2} .$

The function is differentiable, provided $(x, y) \neq(p, q)$ , which we assume. Then

$\nabla \rho(x)=\frac{1}{\sqrt{\left(x_1-p_1\right)^2+\left(x_2-p_2\right)^2}}\left(\begin{array}{l} x_1-p_1 \\ x_2-p_2 \end{array}\right) .$

Log-sum-exp function: Consider the ‘‘log-sum-exp’’ function $\operatorname{lse}: \mathbb{R}^2 \rightarrow \mathbb{R}$ , with values

$\operatorname{lse}(x):=\log \left(e^{x_1}+e^{x_2}\right) .$

The gradient of $L$ at $x$ is

$\nabla \operatorname{lse}(x)=\frac{1}{z_1+z_2}\left(\begin{array}{c} z_1 \\ z_2 \end{array}\right) .$

where $z_i:=e^{x_i}, i=1,2$ . More generally, the gradient of the function $\operatorname{lse}: \mathbb{R}^n \rightarrow \mathbb{R}$ with values

$\operatorname{lse}(x)=\log \left(\sum_{i=1}^n e^{x_i}\right)$

is given by

$\nabla f(x)=\frac{1}{\sum_{i=1}^n e^{x_i}}\left(\begin{array}{c} e^{x_1} \\ \ldots \\ e^{x_n} \end{array}\right)=\frac{1}{Z} z,$

where $z=\left(\begin{array}{c} e^{x_1} \\ \ldots \\ e^{x_n} \end{array}\right)$ , and $Z=\sum_{i=1}^n z_i$ .

Composition rule with an affine function

If $A \in \mathbb{R}^{m \times n}$ is a matrix, and $b \in \mathbb{R}^m$ is a vector, the function $g: \mathbb{R}^m \rightarrow \mathbb{R}$ with values

$g(x)=f(A x+b)$

is called the composition of the affine map $x \rightarrow A x+b$ with $f$ with $f$ . Its gradient is given by

$\nabla g(x)=A^T \nabla f(A x+b) .$

Geometric interpretation

Geometrically, the gradient can be read on the plot of the level set of the function. Specifically, at any point $x$ , the gradient is perpendicular to the level set and points outwards from the sub-level set (that is, it points towards higher values of the function).

Level and sub-level sets of the function $f:\mathbb{R}^n \rightarrow \mathbb{R}$ with values

$f(x) = \operatorname{lse}(\sin(x_1 + 0.3 x_2), 0.2 x_2).$

The gradient at a point (shown in red) is perpendicular to the level set, and points outside the corresponding sub-level set. The length of the gradient determines how fast the function changes locally (The length of the gradient has been scaled up by a factor of $5$ .)

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Definition

Composition rule with an affine function

Geometric interpretation

License

Share This Book