March 6, 2026

The Maximum Eigenvalue is a Convex Function

The maximum eigenvalue of a real symmetric matrix, that is.

Suppose we have a real square matrix $\bm{A}\in\R^{n\times n}$ , the eigenvalues of which determine the amount of scaling $\bm{A}$ performs along each dimension. Recall that $\lambda$ is an eigenvalue of $\bm{A}$ if it satisfies the equation

\begin{equation}\label{1} \bm{A}\bm{v} = \lambda\bm{v} \end{equation}

for some eigenvector $\bm{v}$ . A general real square matrix may have complex eigenvalues and eigenvectors, so we will limit ourselves to real symmetric matrices (that is, those satisfying $\bm{A}=\bm{A}^T$ ), which are guaranteed to have real eigenvalues and eigenvectors (note the converse is not true: a matrix does not necessarily have to be symmetric for all of its eigenvalues to be real). We denote the set of real symmetric $n\times n$ matrices as $\mathbb{S}^n$ .

The eigenvector $\bm{v}$ in $\eqref{1}$ is free to take any non-zero magnitude, so we will assume $\|\bm{v}\|_2=1$ . We can then left-multiply both sides of $\eqref{1}$ by $\bm{v}^T$ to obtain the relationship $\bm{v}^T\bm{A}\bm{v}=\lambda$ . Thus the maximum eigenvalue $\lambda_{\max{}}(\bm{A})$ can be expressed as the optimization problem

\begin{equation}\label{2} \begin{aligned} \lambda_{\max{}}(\bm{A}) \coloneqq \max_{\bm{v}} &\quad \bm{v}^T\bm{A}\bm{v} \\ \text{subject to} &\quad \|\bm{v}\|_2=1. \end{aligned} \end{equation}

It turns out that the maximum eigenvalue function $\lambda_{\max{}}:\mathbb{S}^n\to\R$ is convex, even though $\eqref{2}$ looks like a non-convex problem. Indeed, $\eqref{2}$ is a quadratically-constrained quadratic program (QCQP), but neither the objective function nor the constraint are convex (since we have made no definiteness assumptions about $\bm{A}$ and the constraint is a quadratic equality—we will come back to this).

We will explore the convexity of $\lambda_{\max{}}(\bm{A})$ in the rest of the post, which will allow us to have some fun with convex optimization and Lagrangian duality. Similar arguments also reveal that the minimum eigenvalue is a concave function, but we leave that as an exercise to the reader. If you’ve read Convex Optimization by Boyd and Vandenberghe (it’s freely available at the link provided) there shouldn’t be much here that’s totally new to you, but I think we can still enjoy ourselves.

Of course, if your goal is just to find the maximum eigenvalue of a (symmetric) matrix, there are much faster algorithms than the convex optimization problems we’ll explore in this post, but reasoning about eigenvalues is quite fundamental to convex programs arising in control theory, structural analysis, and portfolio optimization, among others. An interactive Python notebook that accompanies this post can be accessed directly in the browser here.

Convexity

Recall that a function is convex if a line segment drawn between any two points on the function lies on or above the function itself. In our case, we have a function of the form $g:\mathbb{S}^n\to\mathbb{R}$ ; by definition, it is convex if and only if

\begin{equation*} g(t\bm{A} + (1-t)\bm{B}) \leq tg(\bm{A}) + (1-t)g(\bm{B}) \end{equation*}

for any $\bm{A},\bm{B}\in\mathbb{S}^n$ and $t\in[0,1]$ . Using $\eqref{2}$ , we have

\begin{equation*} \begin{aligned} \lambda_{\max{}}(t\bm{A} + (1-t)\bm{B}) &= \max_{\|\bm{v}\|_2=1} \bm{v}^T(t\bm{A} + (1-t)\bm{B})\bm{v} \\ &= \max_{\|\bm{v}\|_2=1} t\bm{v}^T\bm{A}\bm{v} + (1-t)\bm{v}^T\bm{B}\bm{v} \\ &\leq \max_{\|\bm{v}\|_2=1} t\bm{v}^T\bm{A}\bm{v} + \max_{\|\bm{v}\|_2=1}(1-t)\bm{v}^T\bm{B}\bm{v} \\ &= t\lambda_{\max{}}(\bm{A}) + (1-t)\lambda_{\max{}}(\bm{B}), \end{aligned} \end{equation*}

where we have used the fact that the sum of maxima is at least as large as the maximum of a sum, revealing that $\lambda_{\max{}}$ is indeed convex.

In general, the pointwise maximum over a family of convex functions is itself convex (see Figure 1 below for a visual example). That is, if a function $f(\bm{A},\bm{v})$ is convex in $\bm{A}$ for every $\bm{v}\in\mathcal{V}$ , then the function

\begin{equation*} g(\bm{A}) \coloneqq \max_{\bm{v}\in\mathcal{V}}f(\bm{A},\bm{v}) \end{equation*}

is also convex (see Section 3.2.3 of Convex Optimization). In our case, $f(\bm{A},\bm{v})\coloneqq\bm{v}^T\bm{A}\bm{v}$ is linear in $\bm{A}$ , which means it is both convex and concave in $\bm{A}$ , for every $\bm{v}$ .

The pointwise maximum of convex functions is also convex.

Figure 1: The function $g(x)\coloneqq\max_{i=1,2}\ f_i(x)$ (highlighted in red) is the pointwise maximum of two convex functions $f_1$ and $f_2$ , and is therefore itself convex.

(Strong) Duality

Earlier we stated that $\eqref{2}$ is a QCQP, but that it is not a convex problem. For it to be convex, the constraint would need to define a convex set (for example, $\|\bm{v}\|_2\leq1$ ), and, since $\eqref{2}$ is a maximization problem, the objective function would need to be concave (with respect to the optimization variable $\bm{v}$ ). The function $\bm{v}^T\bm{A}\bm{v}$ is only concave if $\bm{A}$ is negative semidefinite, which means $\bm{v}^T\bm{A}\bm{v}\leq0$ for all $\bm{v}\in\mathbb{R}^n$ , and which we denote as $\bm{A}\preccurlyeq\bm{0}$ (analogously, $\bm{A}$ is positive semidefinite if $\bm{v}^T\bm{A}\bm{v}\geq0$ for all $\bm{v}\in\mathbb{R}^n$ , and is denoted $\bm{A}\succcurlyeq\bm{0}$ ). The eigenvalues of a negative semidefinite matrix are all non-positive (and those of a positive semidefinite matrix are all non-negative).

Luckily, despite its non-convexity, $\eqref{2}$ enjoys strong duality. This means that its (convex) dual problem has the same optimal value. To see this, let’s derive the dual problem. The Lagrangian of $\eqref{2}$ is

\begin{equation*} \begin{aligned} \mathcal{L}(\bm{v},\lambda) &= \bm{v}^T\bm{A}\bm{v} - \lambda(\bm{v}^T\bm{v} - 1) \\ &= \bm{v}^T(\bm{A} - \lambda\bm{1}_n)\bm{v} + \lambda, \end{aligned} \end{equation*}

where $\bm{1}_n$ is the $n\times n$ identity matrix and $\lambda\in\mathbb{R}$ is the dual variable. The dual function is

\begin{equation*} \begin{aligned} d(\lambda) &= \sup_{\bm{v}}\mathcal{L}(\bm{v},\lambda) \\ &= \begin{cases} \lambda &\quad \text{if } \bm{A}-\lambda\bm{1}_n\preccurlyeq\bm{0}, \\ \infty &\quad \text{otherwise}, \end{cases} \end{aligned} \end{equation*}

where the $\sup$ is the supremum, or least upper bound (that is, it is the smallest value that is greater than every value of $\mathcal{L}(\bm{v},\lambda)$ ). When $\bm{A}-\lambda\bm{1}_n$ is negative semidefinite, the value $\bm{v}^T(\bm{A}-\lambda\bm{1}_n)\bm{v}$ is at most zero for any $\bm{v}\in\mathbb{R}^n$ , and so $\mathcal{L}(\bm{v},\lambda)$ is at most $\lambda$ ; in all other cases it is unbounded above. Since we don’t care about the unbounded case, we restrict the domain to keep the dual function bounded. This yields the dual problem

\begin{equation}\label{3} \begin{aligned} \min_{\lambda} &\quad \lambda \\ \text{subject to} &\quad \bm{A}\preccurlyeq\lambda\bm{1}_n, \end{aligned} \end{equation}

which is a semidefinite program (SDP), because it includes a semidefinite constraint (that is, a linear matrix inequality). The dual problem is always convex, and SDPs in particular are a class of convex problems that can be solved efficiently with off-the-shelf software, as long as the number of optimization variables is not too large.

Diagonalization Interpretation

One way to interpret $\eqref{3}$ is to consider the eigendecomposition $\bm{A}=\bm{V}\bm{\Lambda}\bm{V}^T$ , where $\bm{V}$ is the orthonormal matrix of eigenvectors and $\bm{\Lambda}$ is the diagonal matrix of eigenvalues, which we can use to rearrange the constraint $\bm{A}\preccurlyeq\lambda\bm{1}_n$ to the diagonalized form $\bm{\Lambda}\preccurlyeq\lambda\bm{1}_n$ . Since both $\bm{\Lambda}$ and $\lambda\bm{1}_n$ are diagonal, the constraint simply requires $\lambda$ to be no smaller than the largest diagonal element (eigenvalue) of $\bm{\Lambda}$ . Thus the minimum possible value of $\lambda$ is the maximum eigenvalue of $\bm{A}$ , revealing that $\eqref{2}$ and $\eqref{3}$ have the same optimal values.

Extremal Interpretation

Another (related) way to interpret $\eqref{3}$ is as follows. We can rewrite the semidefinite constraint $\bm{A}\preccurlyeq\lambda\bm{1}_n$ as

\begin{equation*} \bm{v}^T(\bm{A}-\lambda\bm{1}_n)\bm{v} \leq 0 \quad \text{for all } \bm{v}\in\mathbb{R}^n. \end{equation*}

Without loss of generality, take $\|\bm{v}\|_2=1$ . Then the constraint becomes

\begin{equation*} \bm{v}^T\bm{A}\bm{v} \leq \lambda \quad \text{for all } \|\bm{v}\|_2=1, \end{equation*}

which is equivalent to

\begin{equation*} \left(\max_{\|\bm{v}\|_2=1}\ \bm{v}^T\bm{A}\bm{v}\right) \leq \lambda. \end{equation*}

Obviously, the minimum value of $\lambda$ is obtained at equality:

\begin{equation*} \lambda^{\star} = \max_{\|\bm{v}\|_2=1}\ \bm{v}^T\bm{A}\bm{v}, \end{equation*}

which is the same problem as $\eqref{2}$ .

Geometric Interpretation

Finally, in the special case when $\bm{A}$ is positive definite (that is, all of its eigenvalues are strictly positive), it can be interpreted as defining an ellipsoid $\mathcal{E}$ in $n$ -dimensional space centered at the origin:

\begin{equation*} \mathcal{E} = \{ \bm{x}\in\mathbb{R}^n \mid \bm{x}^T\bm{A}^{-1}\bm{x} \leq 1 \}, \end{equation*}

where the eigenvectors of $\bm{A}$ define the direction of the principal semi-axes of $\mathcal{E}$ and the eigenvalues of $\bm{A}$ are the squares of the lengths of the semi-axes. The problem $\eqref{3}$ can then be interpreted as finding the (square of the) radius of the minimum-volume bounding sphere of $\mathcal{E}$ . A two-dimensional example is shown in Figure 2.

Minimum-volume bounding sphere of an ellipse.

Figure 2: The ellipse $\mathcal{E}$ can be defined by a $2\times2$ matrix $\bm{A}$ with eigenvalues $\lambda_1\geq\lambda_2>0$ . The radius of the minimum-volume bounding sphere of $\mathcal{E}$ is $\sqrt{\lambda_1}$ .

If you know of a good geometric interpretation of $\eqref{3}$ when $\bm{A}$ is a general real symmetric matrix with no definiteness assumptions, I’d like to hear about it!

General Semidefinite Constraints

Any semidefinite constraint is really just an eigenvalue constraint. For example, the constraint $\bm{Y}(\bm{x})\preccurlyeq\bm{0}$ , where $\bm{x}$ is our optimization variable, just constrains the maximum eigenvalue of $\bm{Y}(\bm{x})$ to be at most zero. As long as $\bm{Y}$ is affine in $\bm{x}$ , then the constraint $\bm{Y}(\bm{x})\preccurlyeq\bm{0}$ is convex. We can easily write the constraint from $\eqref{3}$ in this form by defining $\bm{Y}(\lambda)\coloneqq\bm{A}-\lambda\bm{1}_n$ , which is affine in $\lambda$ .

Trust Region Subproblem

In general, the problem of optimizing a quadratic objective function subject to a single quadratic constraint (equality or inequality) enjoys strong duality regardless of whether the objective and constraint are convex (see this paper as well as these slides for more information). This class of problem is often known as the trust region subproblem, because it arises as a step in trust region optimization methods. This result is the consequence of a theorem of alternatives known as the S-lemma.

Dual of the Dual

We can also take the dual of $\eqref{3}$ . The Lagrangian of $\eqref{3}$ is

\begin{equation*} \begin{aligned} \mathcal{L}(\lambda,\bm{Z}) &= \lambda + \mathrm{tr}(\bm{Z}(\bm{A} - \lambda\bm{1}_n)) \\ &= \lambda(1 - \mathrm{tr}(\bm{Z})) + \mathrm{tr}(\bm{A}\bm{Z}), \end{aligned} \end{equation*}

where $\mathrm{tr}(\cdot)$ denotes the matrix trace and $\bm{Z}\succcurlyeq\bm{0}$ is the dual variable. The dual function is

\begin{equation*} \begin{aligned} d(\lambda) &= \inf_{\lambda}\mathcal{L}(\lambda,\bm{Z}) \\ &= \begin{cases} \mathrm{tr}(\bm{A}\bm{Z}) &\quad \text{if } \mathrm{tr}(\bm{Z})=1, \\ -\infty &\quad \text{otherwise}, \end{cases} \end{aligned} \end{equation*}

where $\inf$ is the infinimum, or greatest lower bound, so the dual problem is

\begin{equation}\label{4} \begin{aligned} \max_{\bm{Z}} &\quad \mathrm{tr}(\bm{A}\bm{Z}) \\ \text{subject to} &\quad \mathrm{tr}(\bm{Z})=1, \\ &\quad \bm{Z}\succcurlyeq\bm{0}, \end{aligned} \end{equation}

which is also an SDP.

Relaxation Interpretation

It turns out that $\eqref{4}$ is just a semidefinite relaxation of $\eqref{2}$ . To see this, observe that we can express $\eqref{2}$ in the equivalent form

\begin{equation*} \begin{aligned} \max_{\bm{v}} &\quad \mathrm{tr}(\bm{A}\bm{v}\bm{v}^T) \\ \text{subject to} &\quad \mathrm{tr}(\bm{v}\bm{v}^T)=1, \end{aligned} \end{equation*}

which we obtained using the cyclic property of the trace. We can replace $\bm{v}\bm{v}^T$ with a matrix variable $\bm{Z}\succcurlyeq\bm{0}$ constrained to be rank one (because then $\bm{Z}$ can always be decomposed into the outer product $\bm{Z}=\bm{v}\bm{v}^T$ ), yielding

\begin{equation*} \begin{aligned} \max_{\bm{Z}} &\quad \mathrm{tr}(\bm{A}\bm{Z}) \\ \text{subject to} &\quad \mathrm{tr}(\bm{Z})=1, \\ &\quad \bm{Z}\succcurlyeq\bm{0}, \\ &\quad \mathrm{rank}(\bm{Z}) = 1. \end{aligned} \end{equation*}

Finally, relaxing the problem by dropping the non-convex rank constraint gives us $\eqref{4}$ . We say that this relaxation is tight because it retains the same optimal value as the original problem $\eqref{2}$ .

Sum of the $k$ Largest Eigenvalues

Not only is the maximum eigenvalue of a real symmetric matrix a convex function, but so is the sum of the $k$ largest eigenvalues. If we arrange the eigenvalues of $\bm{A}$ in decreasing order $\lambda_1\geq\lambda_2\geq\dots\geq\lambda_n$ , the sum of the $k$ largest is

\begin{equation*} \begin{aligned} \sum_{i=1}^k\lambda_i = \max_{\{\bm{v}_i\}_{i=1}^k} &\quad \sum_{i=1}^k \bm{v}_i^T\bm{A}\bm{v}_i \\ \text{subject to} &\quad \bm{v}_i^T\bm{v}_i = 1 \quad \text{for all } i=1,\dots,k \\ &\quad \bm{v}_i^T\bm{v}_j = 0 \quad \text{for all } i,j=1,\dots,k,\ i\neq j, \end{aligned} \end{equation*}

where we have constrained each eigenvector to be orthonormal. We can also rewrite this problem in the nicer matrix form

\begin{equation*} \begin{aligned} \sum_{i=1}^k\lambda_i = \max_{\bm{V}} &\quad \mathrm{tr}(\bm{V}^T\bm{A}\bm{V}) \\ \text{subject to} &\quad \bm{V}^T\bm{V} = \bm{1}_k, \end{aligned} \end{equation*}

where the columns of $\bm{V}\in\mathbb{R}^{n\times k}$ are the eigenvectors. This function is again the pointwise maximum over a family of convex functions, so it is itself convex.

Thanks to Connor Holmes for reviewing a draft of this post.