Hessian of a function

Definition

The Hessian of a twice-differentiable function f: \mathbb{R}^n \rightarrow \mathbb{R} at a point x\in {\bf dom} f is the matrix containing the second derivatives of the function at that point. That is, the Hessian is the matrix with elements given by

    \begin{align*} H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}(x),\quad 1\leq i,j \leq n. \end{align*}

The Hessian of f at x is often denoted \nabla^2 f(x).

The second derivative is independent of the order in which derivatives are taken. Hence, H_{ij} = H_{ji} for every pair (i,j). Thus, the Hessian is a symmetric matrix.

Examples

Hessian of a quadratic function

Consider the quadratic function

    \begin{align*} q(x) = x_1^2 + 2x_1 x_2 + 3x_2^2 + 4x_1 + 5x_2 +6 \end{align*}

The Hessian of q at x is given by

    \begin{align*} \frac{\partial^2 q}{\partial x_i \partial x_j}(x) = \left(\begin{array}{cc} \dfrac{\partial^2 q}{\partial x_1^2}(x) & \dfrac{\partial^2 q}{\partial x_1 \partial x_2}(x) \\[3ex] \dfrac{\partial^2 q}{\partial x_2 \partial x_1}(x) & \dfrac{\partial^2 q}{\partial x_2^2}(x) \end{array}\right) = \left(\begin{array}{ll} 2 & 2 \\ 2 & 6 \end{array}\right) \text{. } \end{align*}

For quadratic functions, the Hessian is a constant matrix, that is, it does not depend on the point at which it is evaluated.

Hessian of the log-sum-exp function

Consider the ‘‘log-sum-exp’’ function \mathrm{lse}: \mathbb{R}^2 \rightarrow \mathbb{R}, with values

    \begin{align*} \mathrm{lse}(x):= \log(e^{x_1}+e^{x_2}). \end{align*}

The gradient of \mathrm{lse} at x is

    \begin{align*} \nabla \mathrm{lse}(x) = \frac{1}{z_1 + z_2}\left(\begin{array}{c} z_1 \\ z_2 \end{array}\right). \end{align*}

where z_i: = e^{x_i}, i=1,2. The Hessian is given by

    \begin{align*} \nabla^2 \mathrm{lse}(x) = \frac{z_1 z_2}{(z_1 +z_2)^2}\left(\begin{array}{cc} 1 & -1 \\ -1 & 1 \end{array}\right) \end{align*}

More generally, the Hessian of the function f: \mathbb{R}^n \rightarrow \mathbb{R} with values

    \begin{align*} \mathrm{lse}(x):= \log\sum\limits_{i=1}^{n} \left(e^{x_i}\right). \end{align*}

is as follows.

●  First the gradient at a point x is (see here):

    \begin{align*} \nabla \mathrm{lse}(x) = \frac{1}{\sum_{i=1}^n e^{x_i}}\left(\begin{array}{c} e^{x_1} \\ \cdots\\ e^{x_n} \end{array}\right) = \frac{1}{Z} z, \end{align*}

where z=\left(\begin{array}{c} e^{x_1} \\ \cdots\\ e^{x_n} \end{array}\right), and Z = \sum_{i=1}^n z_i.

●  Now the Hessian at a point x is obtained by taking derivatives of each component of the gradient. If g_i(x) is the i-th component, that is,

    \begin{align*} g_i(x) = \frac{e^{x_i}}{\sum_{i=1}^n e^{x_i}} = \frac{z_i}{Z} \end{align*}

then

    \begin{align*} \frac{\partial g_i(x)}{\partial x_i} = \frac{z_i}{Z} - \frac{z_i^2}{Z^2}, \end{align*}

and, for j \neq i:

    \begin{align*} \frac{\partial g_i(x)}{\partial x_j} = -\frac{z_i z_j}{Z^2}. \end{align*}

More compactly:

    \begin{align*} \nabla^2 \mathrm{lse}(x) = \frac{1}{Z^2} (Z {\bf diag}(z) - zz^T). \end{align*}

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Linear Algebra and Applications Copyright © 2023 by VinUiversity is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.

Share This Book