The Maximum Eigenvalue is a Convex Function
Suppose we have a real square matrix , the eigenvalues of which determine the amount of scaling performs along each dimension. Recall that is an eigenvalue of if it satisfies the equation
for some eigenvector . A general real square matrix may have complex eigenvalues and eigenvectors, so we will limit ourselves to real symmetric matrices (that is, those satisfying ), which are guaranteed to have real eigenvalues and eigenvectors (note the converse is not true: a matrix does not necessarily have to be symmetric for all of its eigenvalues to be real). We denote the set of real symmetric matrices as .
The eigenvector in is free to take any non-zero magnitude, so we will assume . We can then left-multiply both sides of by to obtain the relationship . Thus the maximum eigenvalue can be expressed as the optimization problem
It turns out that the maximum eigenvalue function is convex, even though looks like a non-convex problem. Indeed, is a quadratically-constrained quadratic program (QCQP), but neither the objective function nor the constraint are convex (since we have made no definiteness assumptions about and the constraint is a quadratic equality—we will come back to this).
We will explore the convexity of in the rest of the post, which will allow us to have some fun with convex optimization and Lagrangian duality. Similar arguments also reveal that the minimum eigenvalue is a concave function, but we leave that as an exercise to the reader. If you’ve read Convex Optimization by Boyd and Vandenberghe (it’s freely available at the link provided) there shouldn’t be much here that’s totally new to you, but I think we can still enjoy ourselves.
Of course, if your goal is just to find the maximum eigenvalue of a (symmetric) matrix, there are much faster algorithms than the convex optimization problems we’ll explore in this post, but reasoning about eigenvalues is quite fundamental to convex programs arising in control theory, structural analysis, and portfolio optimization, among others. An interactive Python notebook that accompanies this post can be accessed directly in the browser here.
Convexity
Recall that a function is convex if a line segment drawn between any two points on the function lies on or above the function itself. In our case, we have a function of the form ; by definition, it is convex if and only if
for any and . Using , we have
where we have used the fact that the sum of maxima is at least as large as the maximum of a sum, revealing that is indeed convex.
In general, the pointwise maximum over a family of convex functions is itself convex (see Figure 1 below for a visual example). That is, if a function is convex in for every , then the function
is also convex (see Section 3.2.3 of Convex Optimization). In our case, is linear in , which means it is both convex and concave in , for every .
Figure 1: The function (highlighted in red) is the pointwise maximum of two convex functions and , and is therefore itself convex.
(Strong) Duality
Earlier we stated that is a QCQP, but that it is not a convex problem. For it to be convex, the constraint would need to define a convex set (for example, ), and, since is a maximization problem, the objective function would need to be concave (with respect to the optimization variable ). The function is only concave if is negative semidefinite, which means for all , and which we denote as (analogously, is positive semidefinite if for all , and is denoted ). The eigenvalues of a negative semidefinite matrix are all non-positive (and those of a positive semidefinite matrix are all non-negative).
Luckily, despite its non-convexity, enjoys strong duality. This means that its (convex) dual problem has the same optimal value. To see this, let’s derive the dual problem. The Lagrangian of is
where is the identity matrix and is the dual variable. The dual function is
where the is the supremum, or least upper bound (that is, it is the smallest value that is greater than every value of ). When is negative semidefinite, the value is at most zero for any , and so is at most ; in all other cases it is unbounded above. Since we don’t care about the unbounded case, we restrict the domain to keep the dual function bounded. This yields the dual problem
which is a semidefinite program (SDP), because it includes a semidefinite constraint (that is, a linear matrix inequality). The dual problem is always convex, and SDPs in particular are a class of convex problems that can be solved efficiently with off-the-shelf software, as long as the number of optimization variables is not too large.
Diagonalization Interpretation
One way to interpret is to consider the eigendecomposition , where is the orthonormal matrix of eigenvectors and is the diagonal matrix of eigenvalues, which we can use to rearrange the constraint to the diagonalized form . Since both and are diagonal, the constraint simply requires to be no smaller than the largest diagonal element (eigenvalue) of . Thus the minimum possible value of is the maximum eigenvalue of , revealing that and have the same optimal values.
Extremal Interpretation
Another (related) way to interpret is as follows. We can rewrite the semidefinite constraint as
Without loss of generality, take . Then the constraint becomes
which is equivalent to
Obviously, the minimum value of is obtained at equality:
which is the same problem as .
Geometric Interpretation
Finally, in the special case when is positive definite (that is, all of its eigenvalues are strictly positive), it can be interpreted as defining an ellipsoid in -dimensional space centered at the origin:
where the eigenvectors of define the direction of the principal semi-axes of and the eigenvalues of are the squares of the lengths of the semi-axes. The problem can then be interpreted as finding the (square of the) radius of the minimum-volume bounding sphere of . A two-dimensional example is shown in Figure 2.
Figure 2: The ellipse can be defined by a matrix with eigenvalues . The radius of the minimum-volume bounding sphere of is .
If you know of a good geometric interpretation of when is a general real symmetric matrix with no definiteness assumptions, I’d like to hear about it!
General Semidefinite Constraints
Any semidefinite constraint is really just an eigenvalue constraint. For example, the constraint , where is our optimization variable, just constrains the maximum eigenvalue of to be at most zero. As long as is affine in , then the constraint is convex. We can easily write the constraint from in this form by defining , which is affine in .
Trust Region Subproblem
In general, the problem of optimizing a quadratic objective function subject to a single quadratic constraint (equality or inequality) enjoys strong duality regardless of whether the objective and constraint are convex (see this paper as well as these slides for more information). This class of problem is often known as the trust region subproblem, because it arises as a step in trust region optimization methods. This result is the consequence of a theorem of alternatives known as the S-lemma.
Dual of the Dual
We can also take the dual of . The Lagrangian of is
where denotes the matrix trace and is the dual variable. The dual function is
where is the infinimum, or greatest lower bound, so the dual problem is
which is also an SDP.
Relaxation Interpretation
It turns out that is just a semidefinite relaxation of . To see this, observe that we can express in the equivalent form
which we obtained using the cyclic property of the trace. We can replace with a matrix variable constrained to be rank one (because then can always be decomposed into the outer product ), yielding
Finally, relaxing the problem by dropping the non-convex rank constraint gives us . We say that this relaxation is tight because it retains the same optimal value as the original problem .
Sum of the Largest Eigenvalues
Not only is the maximum eigenvalue of a real symmetric matrix a convex function, but so is the sum of the largest eigenvalues. If we arrange the eigenvalues of in decreasing order , the sum of the largest is
where we have constrained each eigenvector to be orthonormal. We can also rewrite this problem in the nicer matrix form
where the columns of are the eigenvectors. This function is again the pointwise maximum over a family of convex functions, so it is itself convex.
Thanks to Connor Holmes for reviewing a draft of this post.