The Maximum Eigenvalue is a Convex Function

The maximum eigenvalue of a real symmetric matrix, that is.

Suppose we have a real square matrix ARn×n\bm{A}\in\R^{n\times n}, the eigenvalues of which determine the amount of scaling A\bm{A} performs along each dimension. Recall that λ\lambda is an eigenvalue of A\bm{A} if it satisfies the equation

Av=λv\begin{equation}\label{1} \bm{A}\bm{v} = \lambda\bm{v} \end{equation}

for some eigenvector v\bm{v}. A general real square matrix may have complex eigenvalues and eigenvectors, so we will limit ourselves to real symmetric matrices (that is, those satisfying A=AT\bm{A}=\bm{A}^T), which are guaranteed to have real eigenvalues and eigenvectors (note the converse is not true: a matrix does not necessarily have to be symmetric for all of its eigenvalues to be real). We denote the set of real symmetric n×nn\times n matrices as Sn\mathbb{S}^n.

The eigenvector v\bm{v} in (1)\eqref{1} is free to take any non-zero magnitude, so we will assume v2=1\|\bm{v}\|_2=1. We can then left-multiply both sides of (1)\eqref{1} by vT\bm{v}^T to obtain the relationship vTAv=λ\bm{v}^T\bm{A}\bm{v}=\lambda. Thus the maximum eigenvalue λmax(A)\lambda_{\max{}}(\bm{A}) can be expressed as the optimization problem

λmax(A)maxvvTAvsubject tov2=1.\begin{equation}\label{2} \begin{aligned} \lambda_{\max{}}(\bm{A}) \coloneqq \max_{\bm{v}} &\quad \bm{v}^T\bm{A}\bm{v} \\ \text{subject to} &\quad \|\bm{v}\|_2=1. \end{aligned} \end{equation}

It turns out that the maximum eigenvalue function λmax:SnR\lambda_{\max{}}:\mathbb{S}^n\to\R is convex, even though (2)\eqref{2} looks like a non-convex problem. Indeed, (2)\eqref{2} is a quadratically-constrained quadratic program (QCQP), but neither the objective function nor the constraint are convex (since we have made no definiteness assumptions about A\bm{A} and the constraint is a quadratic equality—we will come back to this).

We will explore the convexity of λmax(A)\lambda_{\max{}}(\bm{A}) in the rest of the post, which will allow us to have some fun with convex optimization and Lagrangian duality. Similar arguments also reveal that the minimum eigenvalue is a concave function, but we leave that as an exercise to the reader. If you’ve read Convex Optimization by Boyd and Vandenberghe (it’s freely available at the link provided) there shouldn’t be much here that’s totally new to you, but I think we can still enjoy ourselves.

Of course, if your goal is just to find the maximum eigenvalue of a (symmetric) matrix, there are much faster algorithms than the convex optimization problems we’ll explore in this post, but reasoning about eigenvalues is quite fundamental to convex programs arising in control theory, structural analysis, and portfolio optimization, among others. An interactive Python notebook that accompanies this post can be accessed directly in the browser here.

Convexity

Recall that a function is convex if a line segment drawn between any two points on the function lies on or above the function itself. In our case, we have a function of the form g:SnRg:\mathbb{S}^n\to\mathbb{R}; by definition, it is convex if and only if

g(tA+(1t)B)tg(A)+(1t)g(B)\begin{equation*} g(t\bm{A} + (1-t)\bm{B}) \leq tg(\bm{A}) + (1-t)g(\bm{B}) \end{equation*}

for any A,BSn\bm{A},\bm{B}\in\mathbb{S}^n and t[0,1]t\in[0,1]. Using (2)\eqref{2}, we have

λmax(tA+(1t)B)=maxv2=1vT(tA+(1t)B)v=maxv2=1tvTAv+(1t)vTBvmaxv2=1tvTAv+maxv2=1(1t)vTBv=tλmax(A)+(1t)λmax(B),\begin{equation*} \begin{aligned} \lambda_{\max{}}(t\bm{A} + (1-t)\bm{B}) &= \max_{\|\bm{v}\|_2=1} \bm{v}^T(t\bm{A} + (1-t)\bm{B})\bm{v} \\ &= \max_{\|\bm{v}\|_2=1} t\bm{v}^T\bm{A}\bm{v} + (1-t)\bm{v}^T\bm{B}\bm{v} \\ &\leq \max_{\|\bm{v}\|_2=1} t\bm{v}^T\bm{A}\bm{v} + \max_{\|\bm{v}\|_2=1}(1-t)\bm{v}^T\bm{B}\bm{v} \\ &= t\lambda_{\max{}}(\bm{A}) + (1-t)\lambda_{\max{}}(\bm{B}), \end{aligned} \end{equation*}

where we have used the fact that the sum of maxima is at least as large as the maximum of a sum, revealing that λmax\lambda_{\max{}} is indeed convex.

In general, the pointwise maximum over a family of convex functions is itself convex (see Figure 1 below for a visual example). That is, if a function f(A,v)f(\bm{A},\bm{v}) is convex in A\bm{A} for every vV\bm{v}\in\mathcal{V}, then the function

g(A)maxvVf(A,v)\begin{equation*} g(\bm{A}) \coloneqq \max_{\bm{v}\in\mathcal{V}}f(\bm{A},\bm{v}) \end{equation*}

is also convex (see Section 3.2.3 of Convex Optimization). In our case, f(A,v)vTAvf(\bm{A},\bm{v})\coloneqq\bm{v}^T\bm{A}\bm{v} is linear in A\bm{A}, which means it is both convex and concave in A\bm{A}, for every v\bm{v}.

The pointwise maximum of convex functions is also convex.

Figure 1: The function g(x)maxi=1,2 fi(x)g(x)\coloneqq\max_{i=1,2}\ f_i(x) (highlighted in red) is the pointwise maximum of two convex functions f1f_1 and f2f_2, and is therefore itself convex.

(Strong) Duality

Earlier we stated that (2)\eqref{2} is a QCQP, but that it is not a convex problem. For it to be convex, the constraint would need to define a convex set (for example, v21\|\bm{v}\|_2\leq1), and, since (2)\eqref{2} is a maximization problem, the objective function would need to be concave (with respect to the optimization variable v\bm{v}). The function vTAv\bm{v}^T\bm{A}\bm{v} is only concave if A\bm{A} is negative semidefinite, which means vTAv0\bm{v}^T\bm{A}\bm{v}\leq0 for all vRn\bm{v}\in\mathbb{R}^n, and which we denote as A0\bm{A}\preccurlyeq\bm{0} (analogously, A\bm{A} is positive semidefinite if vTAv0\bm{v}^T\bm{A}\bm{v}\geq0 for all vRn\bm{v}\in\mathbb{R}^n, and is denoted A0\bm{A}\succcurlyeq\bm{0}). The eigenvalues of a negative semidefinite matrix are all non-positive (and those of a positive semidefinite matrix are all non-negative).

Luckily, despite its non-convexity, (2)\eqref{2} enjoys strong duality. This means that its (convex) dual problem has the same optimal value. To see this, let’s derive the dual problem. The Lagrangian of (2)\eqref{2} is

L(v,λ)=vTAvλ(vTv1)=vT(Aλ1n)v+λ,\begin{equation*} \begin{aligned} \mathcal{L}(\bm{v},\lambda) &= \bm{v}^T\bm{A}\bm{v} - \lambda(\bm{v}^T\bm{v} - 1) \\ &= \bm{v}^T(\bm{A} - \lambda\bm{1}_n)\bm{v} + \lambda, \end{aligned} \end{equation*}

where 1n\bm{1}_n is the n×nn\times n identity matrix and λR\lambda\in\mathbb{R} is the dual variable. The dual function is

d(λ)=supvL(v,λ)={λif Aλ1n0,otherwise,\begin{equation*} \begin{aligned} d(\lambda) &= \sup_{\bm{v}}\mathcal{L}(\bm{v},\lambda) \\ &= \begin{cases} \lambda &\quad \text{if } \bm{A}-\lambda\bm{1}_n\preccurlyeq\bm{0}, \\ \infty &\quad \text{otherwise}, \end{cases} \end{aligned} \end{equation*}

where the sup\sup is the supremum, or least upper bound (that is, it is the smallest value that is greater than every value of L(v,λ)\mathcal{L}(\bm{v},\lambda)). When Aλ1n\bm{A}-\lambda\bm{1}_n is negative semidefinite, the value vT(Aλ1n)v\bm{v}^T(\bm{A}-\lambda\bm{1}_n)\bm{v} is at most zero for any vRn\bm{v}\in\mathbb{R}^n, and so L(v,λ)\mathcal{L}(\bm{v},\lambda) is at most λ\lambda; in all other cases it is unbounded above. Since we don’t care about the unbounded case, we restrict the domain to keep the dual function bounded. This yields the dual problem

minλλsubject toAλ1n,\begin{equation}\label{3} \begin{aligned} \min_{\lambda} &\quad \lambda \\ \text{subject to} &\quad \bm{A}\preccurlyeq\lambda\bm{1}_n, \end{aligned} \end{equation}

which is a semidefinite program (SDP), because it includes a semidefinite constraint (that is, a linear matrix inequality). The dual problem is always convex, and SDPs in particular are a class of convex problems that can be solved efficiently with off-the-shelf software, as long as the number of optimization variables is not too large.

Diagonalization Interpretation

One way to interpret (3)\eqref{3} is to consider the eigendecomposition A=VΛVT\bm{A}=\bm{V}\bm{\Lambda}\bm{V}^T, where V\bm{V} is the orthonormal matrix of eigenvectors and Λ\bm{\Lambda} is the diagonal matrix of eigenvalues, which we can use to rearrange the constraint Aλ1n\bm{A}\preccurlyeq\lambda\bm{1}_n to the diagonalized form Λλ1n\bm{\Lambda}\preccurlyeq\lambda\bm{1}_n. Since both Λ\bm{\Lambda} and λ1n\lambda\bm{1}_n are diagonal, the constraint simply requires λ\lambda to be no smaller than the largest diagonal element (eigenvalue) of Λ\bm{\Lambda}. Thus the minimum possible value of λ\lambda is the maximum eigenvalue of A\bm{A}, revealing that (2)\eqref{2} and (3)\eqref{3} have the same optimal values.

Extremal Interpretation

Another (related) way to interpret (3)\eqref{3} is as follows. We can rewrite the semidefinite constraint Aλ1n\bm{A}\preccurlyeq\lambda\bm{1}_n as

vT(Aλ1n)v0for all vRn.\begin{equation*} \bm{v}^T(\bm{A}-\lambda\bm{1}_n)\bm{v} \leq 0 \quad \text{for all } \bm{v}\in\mathbb{R}^n. \end{equation*}

Without loss of generality, take v2=1\|\bm{v}\|_2=1. Then the constraint becomes

vTAvλfor all v2=1,\begin{equation*} \bm{v}^T\bm{A}\bm{v} \leq \lambda \quad \text{for all } \|\bm{v}\|_2=1, \end{equation*}

which is equivalent to

(maxv2=1 vTAv)λ.\begin{equation*} \left(\max_{\|\bm{v}\|_2=1}\ \bm{v}^T\bm{A}\bm{v}\right) \leq \lambda. \end{equation*}

Obviously, the minimum value of λ\lambda is obtained at equality:

λ=maxv2=1 vTAv,\begin{equation*} \lambda^{\star} = \max_{\|\bm{v}\|_2=1}\ \bm{v}^T\bm{A}\bm{v}, \end{equation*}

which is the same problem as (2)\eqref{2}.

Geometric Interpretation

Finally, in the special case when A\bm{A} is positive definite (that is, all of its eigenvalues are strictly positive), it can be interpreted as defining an ellipsoid E\mathcal{E} in nn-dimensional space centered at the origin:

E={xRnxTA1x1},\begin{equation*} \mathcal{E} = \{ \bm{x}\in\mathbb{R}^n \mid \bm{x}^T\bm{A}^{-1}\bm{x} \leq 1 \}, \end{equation*}

where the eigenvectors of A\bm{A} define the direction of the principal semi-axes of E\mathcal{E} and the eigenvalues of A\bm{A} are the squares of the lengths of the semi-axes. The problem (3)\eqref{3} can then be interpreted as finding the (square of the) radius of the minimum-volume bounding sphere of E\mathcal{E}. A two-dimensional example is shown in Figure 2.

Minimum-volume bounding sphere of an ellipse.

Figure 2: The ellipse E\mathcal{E} can be defined by a 2×22\times2 matrix A\bm{A} with eigenvalues λ1λ2>0\lambda_1\geq\lambda_2>0. The radius of the minimum-volume bounding sphere of E\mathcal{E} is λ1\sqrt{\lambda_1}.

If you know of a good geometric interpretation of (3)\eqref{3} when A\bm{A} is a general real symmetric matrix with no definiteness assumptions, I’d like to hear about it!

General Semidefinite Constraints

Any semidefinite constraint is really just an eigenvalue constraint. For example, the constraint Y(x)0\bm{Y}(\bm{x})\preccurlyeq\bm{0}, where x\bm{x} is our optimization variable, just constrains the maximum eigenvalue of Y(x)\bm{Y}(\bm{x}) to be at most zero. As long as Y\bm{Y} is affine in x\bm{x}, then the constraint Y(x)0\bm{Y}(\bm{x})\preccurlyeq\bm{0} is convex. We can easily write the constraint from (3)\eqref{3} in this form by defining Y(λ)Aλ1n\bm{Y}(\lambda)\coloneqq\bm{A}-\lambda\bm{1}_n, which is affine in λ\lambda.

Trust Region Subproblem

In general, the problem of optimizing a quadratic objective function subject to a single quadratic constraint (equality or inequality) enjoys strong duality regardless of whether the objective and constraint are convex (see this paper as well as these slides for more information). This class of problem is often known as the trust region subproblem, because it arises as a step in trust region optimization methods. This result is the consequence of a theorem of alternatives known as the S-lemma.

Dual of the Dual

We can also take the dual of (3)\eqref{3}. The Lagrangian of (3)\eqref{3} is

L(λ,Z)=λ+tr(Z(Aλ1n))=λ(1tr(Z))+tr(AZ),\begin{equation*} \begin{aligned} \mathcal{L}(\lambda,\bm{Z}) &= \lambda + \mathrm{tr}(\bm{Z}(\bm{A} - \lambda\bm{1}_n)) \\ &= \lambda(1 - \mathrm{tr}(\bm{Z})) + \mathrm{tr}(\bm{A}\bm{Z}), \end{aligned} \end{equation*}

where tr()\mathrm{tr}(\cdot) denotes the matrix trace and Z0\bm{Z}\succcurlyeq\bm{0} is the dual variable. The dual function is

d(λ)=infλL(λ,Z)={tr(AZ)if tr(Z)=1,otherwise,\begin{equation*} \begin{aligned} d(\lambda) &= \inf_{\lambda}\mathcal{L}(\lambda,\bm{Z}) \\ &= \begin{cases} \mathrm{tr}(\bm{A}\bm{Z}) &\quad \text{if } \mathrm{tr}(\bm{Z})=1, \\ -\infty &\quad \text{otherwise}, \end{cases} \end{aligned} \end{equation*}

where inf\inf is the infinimum, or greatest lower bound, so the dual problem is

maxZtr(AZ)subject totr(Z)=1,Z0,\begin{equation}\label{4} \begin{aligned} \max_{\bm{Z}} &\quad \mathrm{tr}(\bm{A}\bm{Z}) \\ \text{subject to} &\quad \mathrm{tr}(\bm{Z})=1, \\ &\quad \bm{Z}\succcurlyeq\bm{0}, \end{aligned} \end{equation}

which is also an SDP.

Relaxation Interpretation

It turns out that (4)\eqref{4} is just a semidefinite relaxation of (2)\eqref{2}. To see this, observe that we can express (2)\eqref{2} in the equivalent form

maxvtr(AvvT)subject totr(vvT)=1,\begin{equation*} \begin{aligned} \max_{\bm{v}} &\quad \mathrm{tr}(\bm{A}\bm{v}\bm{v}^T) \\ \text{subject to} &\quad \mathrm{tr}(\bm{v}\bm{v}^T)=1, \end{aligned} \end{equation*}

which we obtained using the cyclic property of the trace. We can replace vvT\bm{v}\bm{v}^T with a matrix variable Z0\bm{Z}\succcurlyeq\bm{0} constrained to be rank one (because then Z\bm{Z} can always be decomposed into the outer product Z=vvT\bm{Z}=\bm{v}\bm{v}^T), yielding

maxZtr(AZ)subject totr(Z)=1,Z0,rank(Z)=1.\begin{equation*} \begin{aligned} \max_{\bm{Z}} &\quad \mathrm{tr}(\bm{A}\bm{Z}) \\ \text{subject to} &\quad \mathrm{tr}(\bm{Z})=1, \\ &\quad \bm{Z}\succcurlyeq\bm{0}, \\ &\quad \mathrm{rank}(\bm{Z}) = 1. \end{aligned} \end{equation*}

Finally, relaxing the problem by dropping the non-convex rank constraint gives us (4)\eqref{4}. We say that this relaxation is tight because it retains the same optimal value as the original problem (2)\eqref{2}.

Sum of the kk Largest Eigenvalues

Not only is the maximum eigenvalue of a real symmetric matrix a convex function, but so is the sum of the kk largest eigenvalues. If we arrange the eigenvalues of A\bm{A} in decreasing order λ1λ2λn\lambda_1\geq\lambda_2\geq\dots\geq\lambda_n, the sum of the kk largest is

i=1kλi=max{vi}i=1ki=1kviTAvisubject toviTvi=1for all i=1,,kviTvj=0for all i,j=1,,k, ij,\begin{equation*} \begin{aligned} \sum_{i=1}^k\lambda_i = \max_{\{\bm{v}_i\}_{i=1}^k} &\quad \sum_{i=1}^k \bm{v}_i^T\bm{A}\bm{v}_i \\ \text{subject to} &\quad \bm{v}_i^T\bm{v}_i = 1 \quad \text{for all } i=1,\dots,k \\ &\quad \bm{v}_i^T\bm{v}_j = 0 \quad \text{for all } i,j=1,\dots,k,\ i\neq j, \end{aligned} \end{equation*}

where we have constrained each eigenvector to be orthonormal. We can also rewrite this problem in the nicer matrix form

i=1kλi=maxVtr(VTAV)subject toVTV=1k,\begin{equation*} \begin{aligned} \sum_{i=1}^k\lambda_i = \max_{\bm{V}} &\quad \mathrm{tr}(\bm{V}^T\bm{A}\bm{V}) \\ \text{subject to} &\quad \bm{V}^T\bm{V} = \bm{1}_k, \end{aligned} \end{equation*}

where the columns of VRn×k\bm{V}\in\mathbb{R}^{n\times k} are the eigenvectors. This function is again the pointwise maximum over a family of convex functions, so it is itself convex.

Thanks to Connor Holmes for reviewing a draft of this post.