Linear Algebra and Robot Control

In this post we take a look at some linear algebra through the lens of optimization and then see how it can be applied to robot manipulator control. In particular, we will spend a fair bit of time looking at pseudoinverses and nullspace projection.

Linear Systems of Equations

As engineers we often encounter linear systems of equations of the form

Ax=b,\begin{equation}\label{1} \bm{A}\bm{x} = \bm{b}, \end{equation}

where we are given a matrix ARm×n\bm{A}\in\mathbb{R}^{m\times n} and a vector bRm\bm{b}\in\mathbb{R}^m, and we want to find a vector xRm\bm{x}\in\R^m satisfying the equation. In this post, we are interested in the solutions of (1)\eqref{1}. We will begin with some background review to make this post fairly self-contained.

Background

Column Space and Nullspace

The column space C(A)\mathcal{C}(\bm{A}) of a matrix A\bm{A} is the span of its columns; that is, it is the set of all linear combinations of the columns of A\bm{A}:

C(A)={AxxRn}.\begin{equation*} \mathcal{C}(\bm{A}) = \{ \bm{A}\bm{x} \mid \bm{x}\in\mathbb{R}^n \}. \end{equation*}

The nullspace (also called the kernel) N(A)\mathcal{N}(\bm{A}) of A\bm{A} is the set of all vectors x\bm{x} for which Ax=0\bm{A}\bm{x}=\bm{0}:

N(A)={xRnAx=0}.\begin{equation*} \mathcal{N}(\bm{A}) = \{ \bm{x}\in\mathbb{R}^n \mid \bm{A}\bm{x}=\bm{0} \}. \end{equation*}

Singular Values

Any matrix ARm×n\bm{A}\in\mathbb{R}^{m\times n} can be expressed using the singular value decomposition (SVD) as

A=i=1rσiuiviT,\begin{equation*} \bm{A} = \textstyle\sum_{i=1}^r \sigma_i\bm{u}_i\bm{v}_i^T, \end{equation*}

where r=min(m,n)r=\min(m,n), σi\sigma_i is the iith singular value of A\bm{A}, and uiRm\bm{u}_i\in\R^m and viRn\bm{v}_i\in\R^n are the iith left and right singular vectors of A\bm{A}. The singular values are all non-negative and each set of singular vectors forms an orthonormal basis. Expressed in matrix form, we have

A=UΣVT,\begin{equation*} \bm{A} = \bm{U}\bm{\Sigma}\bm{V}^T, \end{equation*}

where URm×m\bm{U}\in\mathbb{R}^{m\times m} and VRn×n\bm{V}\in\mathbb{R}^{n\times n} are orthonormal and ΣRm×n\bm{\Sigma}\in\mathbb{R}^{m\times n} is a rectangular diagonal matrix with the singular values on the diagonal.

Rank

The rank of A\bm{A} is equal to the number of non-zero singular values. We say that a matrix A\bm{A} satisfying rank(A)=r\mathrm{rank}(\bm{A})=r has full rank (i.e., the maximum rank it can possibly have given its shape). The rank of A\bm{A} is equal to the number of linearly independent columns (or rows) of A\bm{A}, which is also the dimension of the subspace spanned by C(A)\mathcal{C}(\bm{A}).

Inverse

When A\bm{A} has full rank and is also square, such that rank(A)=m=n\mathrm{rank}(\bm{A})=m=n, then its inverse A1\bm{A}^{-1} exists and we say that A\bm{A} is invertible. Expressed in terms of the SVD of A\bm{A}, the inverse is given by

A1=i=1nσi1viuiT=VΣ1UT.\begin{equation}\label{2} \bm{A}^{-1} = \textstyle\sum_{i=1}^n \sigma_i^{-1}\bm{v}_i\bm{u}_i^T = \bm{V}\bm{\Sigma}^{-1}\bm{U}^T. \end{equation}

(When A\bm{A} is square, the SVD is actually just an eigendecomposition, so in this case ui=vi\bm{u}_i=\bm{v}_i are the eigenvectors of A\bm{A} and σi\sigma_i are its eigenvalues.) Notice that if one (or more) of the singular values σi=0\sigma_i=0, then we cannot compute the inverse because σi1\sigma_i^{-1} does not exist, and we say that A\bm{A} is singular. Note that we only use the term “singular” in reference to square matrices.

Condition Number

The “closeness” to singularity for a square matrix A\bm{A} is quantified by the matrix’s condition number κ(A)\kappa(\bm{A}), which describes how much the solution x\bm{x} of (1)\eqref{1} can change when the value of b\bm{b} changes. A common definition is

κ(A)=σmax(A)σmin(A),\begin{equation*} \kappa(\bm{A}) = \frac{\sigma_{\max}(\bm{A})}{\sigma_{\min}(\bm{A})}, \end{equation*}

which is the ratio between the largest and smallest singular values. When κ(A)\kappa(\bm{A}) is low, small changes in b\bm{b} do not cause large changes in x\bm{x}, and we say that A\bm{A} is well-conditioned. When κ(A)\kappa(\bm{A}) is high, small changes in b\bm{b} can cause large changes in x\bm{x}, and we say that A\bm{A} is ill-conditioned. A larger condition number means A\bm{A} is closer to singularity, with κ(A)=\kappa(\bm{A})=\infty when A\bm{A} is singular.

We are now ready to return to our examination of (1)\eqref{1}.

Solutions of Linear Systems

To determine the number of possible solutions of (1)\eqref{1}, we use the augmented matrix

A~=[Ab]Rm×(n+1),\begin{equation*} \tilde{\bm{A}} = \begin{bmatrix} \bm{A} & \bm{b} \end{bmatrix}\in\mathbb{R}^{m\times(n+1)}, \end{equation*}

which is simply A\bm{A} with b\bm{b} added as an additional column. The Rouché-Capelli theorem tells us that:

  • if rank(A~)>rank(A)\mathrm{rank}(\tilde{\bm{A}}) > \mathrm{rank}(\bm{A}), then there is no solution;
  • if rank(A~)=rank(A)=n\mathrm{rank}(\tilde{\bm{A}}) = \mathrm{rank}(\bm{A}) = n, then there is a single unique solution;
  • otherwise, there are infinite solutions.

When the system has a solution we say it is consistent; if it has no solution then it is inconsistent. Consistent systems are those satisfying bC(A)\bm{b}\in\mathcal{C}(\bm{A}). When the system has multiple (infinite) solutions, we say that the system is under-determined.

The case of a single unique solution is easy because the inverse A1\bm{A}^{-1} exists, so the solution is simply x=A1b\bm{x}=\bm{A}^{-1}\bm{b}. Let’s take a look at the other (more interesting) cases.

No Solutions

When the system is inconsistent, we cannot solve it exactly. A reasonable alternative is to choose the value of x\bm{x} that comes the closest to satisfying (1)\eqref{1} (in the sense of least-squares). That is, we want to find the x\bm{x} that solves the least-squares problem

minx(1/2)Axb22,\begin{equation}\label{3} \min_{\bm{x}} \quad (1/2)\|\bm{A}\bm{x}-\bm{b}\|^2_2, \end{equation}

which can be interpreted as finding the vector in C(A)\mathcal{C}(\bm{A}) that has the smallest Euclidean distance to b\bm{b} (note that the coefficient (1/2)(1/2) does not change the solution, but avoids ugly coefficients in the derivative below). To solve (3)\eqref{3}, we just take the derivative and set it equal to zero, revealing that the solution satisfies

x(1/2)Axb22=ATAxATb=0,\begin{equation*} \frac{\partial}{\partial\bm{x}} (1/2)\|\bm{A}\bm{x}-\bm{b}\|^2_2 = \bm{A}^T\bm{A}\bm{x} - \bm{A}^T\bm{b} = \bm{0}, \end{equation*}

which we rearrange to obtain the solution

x=(ATA)1ATb.\begin{equation*} \bm{x} = (\bm{A}^T\bm{A})^{-1}\bm{A}^T\bm{b}. \end{equation*}

The matrix

A+:=(ATA)1AT\begin{equation*} \bm{A}^+ := (\bm{A}^T\bm{A})^{-1}\bm{A}^T \end{equation*}

is known as the Moore-Penrose pseudoinverse of A\bm{A}. In this case, it is a left inverse because it satisfies the identity A+A=1n\bm{A}^+\bm{A}=\bm{1}_n (i.e., it multiplies A\bm{A} from the left; we will see a right inverse below), where 1n\bm{1}_n is the n×nn\times n identity matrix.

Conditioning and Singularities

As the condition number of ATA\bm{A}^T\bm{A} increases, small changes in b\bm{b} (due to noisy measurements, for example) can cause large changes in the solution of (3)\eqref{3}. Indeed, if A\bm{A} is not full-rank, then ATA\bm{A}^T\bm{A} is singular and cannot be inverted at all. To see this, we can use the SVD of A\bm{A} to obtain

(ATA)1=(VΣTUTUΣV)1=(VΣTΣVT)1=V(ΣTΣ)1VT=i=1nσi2viviT,\begin{equation*} \begin{aligned} (\bm{A}^T\bm{A})^{-1} &= (\bm{V}\bm{\Sigma}^T\bm{U}^T\bm{U}\bm{\Sigma}\bm{V})^{-1} \\ &= (\bm{V}\bm{\Sigma}^T\bm{\Sigma}\bm{V}^T)^{-1} \\ &= \bm{V}(\bm{\Sigma}^T\bm{\Sigma})^{-1}\bm{V}^T \\ &= \textstyle\sum_{i=1}^n\sigma_i^{-2}\bm{v}_i\bm{v}_i^T, \end{aligned} \end{equation*}

which shows that we cannot compute (ATA)1(\bm{A}^T\bm{A})^{-1} if any σi=0\sigma_i=0, because then σi2\sigma_i^{-2} does not exist.

To deal with poor conditioning and singularities, we can regularize the solution by adding a damping term to (3)\eqref{3} so that it becomes

minx(1/2)Axb22+(α/2)x22,\begin{equation}\label{4} \min_{\bm{x}} \quad (1/2)\|\bm{A}\bm{x}-\bm{b}\|^2_2 + (\alpha/2)\|\bm{x}\|^2_2, \end{equation}

where α0\alpha\geq0 is a small constant damping factor that serves to bias the solution toward a smaller norm (this regularization approach is commonly known as ridge regression). The solution of (4)\eqref{4} is

x=(ATA+α1n)1ATb,\begin{equation*} \bm{x} = (\bm{A}^T\bm{A} + \alpha\bm{1}_n)^{-1}\bm{A}^T\bm{b}, \end{equation*}

where

A+(α):=(ATA+α1n)1AT\begin{equation}\label{5} \bm{A}^+(\alpha) := (\bm{A}^T\bm{A} + \alpha\bm{1}_n)^{-1}\bm{A}^T \end{equation}

is called the damped pseudoinverse of A\bm{A}. Again using the SVD of A\bm{A}, we have

(ATA+α1n)1=(VΣTΣVT+α1n)1=V(ΣTΣ+α1n)1VT=i=1n(σi2+α)1viviT,\begin{equation*} \begin{aligned} (\bm{A}^T\bm{A} + \alpha\bm{1}_n)^{-1} &= (\bm{V}\bm{\Sigma}^T\bm{\Sigma}\bm{V}^T + \alpha\bm{1}_n)^{-1} \\ &= \bm{V}(\bm{\Sigma}^T\bm{\Sigma} + \alpha\bm{1}_n)^{-1}\bm{V}^T \\ &= \textstyle\sum_{i=1}^n(\sigma_i^{2}+\alpha)^{-1}\bm{v}_i\bm{v}_i^T, \end{aligned} \end{equation*}

which shows that the matrix inverse in (5)\eqref{5} always exists for α>0\alpha>0, since α>0\alpha>0 implies (σi2+α)1(\sigma_i^2+\alpha)^{-1} exists even if σi=0\sigma_i=0.

Infinite Solutions

When the system is under-determined, there are infinite values of x\bm{x} that satisfy (1)\eqref{1}, so we need to pick one. If we imagine that x\bm{x} represents some sort of effort or resource we want to save, a reasonable choice is the solution with the smallest Euclidean norm; that is, the solution satisfying

minx(1/2)x22subject toAx=b.\begin{equation}\label{6} \begin{aligned} \min_{\bm{x}} &\quad (1/2)\|\bm{x}\|^2_2 \\ \text{subject to} &\quad \bm{A}\bm{x} = \bm{b}. \end{aligned} \end{equation}

To solve this optimization problem, we construct the Lagrangian function

L(x,λ)=(1/2)x22+λT(Axb),\begin{equation*} \mathcal{L}(\bm{x},\bm{\lambda}) = (1/2)\|\bm{x}\|^2_2 + \bm{\lambda}^T(\bm{A}\bm{x}-\bm{b}), \end{equation*}

where λRm\bm{\lambda}\in\mathbb{R}^m is the vector of Lagrange multipliers. At the optimum we have

Lx=x+ATλ=0,\begin{equation}\label{7} \frac{\partial\mathcal{L}}{\partial\bm{x}} = \bm{x} + \bm{A}^T\bm{\lambda} = \bm{0}, \end{equation}

which we can left-multiply by A\bm{A} to obtain AATλ=Ax\bm{A}\bm{A}^T\bm{\lambda}=-\bm{A}\bm{x} and thus λ=(AAT)1b\bm{\lambda}=-(\bm{A}\bm{A}^T)^{-1}\bm{b}. Substituting back into (7)\eqref{7} and rearranging yields the solution

x=AT(AAT)1b.\begin{equation}\label{8} \bm{x} = \bm{A}^T(\bm{A}\bm{A}^T)^{-1}\bm{b}. \end{equation}

The matrix A+=AT(AAT)1\bm{A}^+=\bm{A}^T(\bm{A}\bm{A}^T)^{-1} is again the Moore-Penrose pseudoinverse of A\bm{A}, but this time it is a right inverse because it satisfies AA+=1m\bm{A}\bm{A}^+=\bm{1}_m.

The Moore-Penrose Pseudoinverse

Wait a minute — we said previously that the Moore-Penrose pseudoinverse of A\bm{A} is (ATA)1AT(\bm{A}^T\bm{A})^{-1}\bm{A}^T, but now we are saying it is AT(AAT)1\bm{A}^T(\bm{A}\bm{A}^T)^{-1}. What’s going on?

The Moore-Penrose pseudoinverse always exists and is unique, with the general solution given in terms of the SVD of A\bm{A}:

A+=i=1rσi+viuiTRn×m,\begin{equation}\label{9} \bm{A}^+ = \textstyle\sum_{i=1}^r\sigma_i^+\bm{v}_i\bm{u}_i^T \in \mathbb{R}^{n\times m}, \end{equation}

where

σi+={σi1if σi>0,0otherwise.\begin{equation*} \sigma_i^+ = \begin{cases} \sigma_i^{-1} &\quad \text{if } \sigma_i>0, \\ 0 &\quad \text{otherwise.} \end{cases} \end{equation*}

(Notice the similarity to (2)\eqref{2}.)

However, if A\bm{A} is full-rank, then the pseudoinverse also has a nice algebraic expression (which is equivalent to (9)\eqref{9}):

  • if m=nm=n, A+=A1\bm{A}^+=\bm{A}^{-1};
  • if m<nm<n, A+=AT(AAT)1\bm{A}^+=\bm{A}^T(\bm{A}\bm{A}^T)^{-1};
  • if m>nm>n, A+=(ATA)1AT\bm{A}^+=(\bm{A}^T\bm{A})^{-1}\bm{A}^T.

The pseudoinverse A+\bm{A}^+ always has the same rank as A\bm{A}.

The Damped Pseudoinverse

Recall that we introduced damping in (4)\eqref{4} to handle ill-conditioning when ATA\bm{A}^T\bm{A} approached singularity, resulting in the damped pseudoinverse A+(α)\bm{A}^+(\alpha). We can also obtain the damped pseudoinverse by modifying (6)\eqref{6} to defend ourselves against poor conditioning in AAT\bm{A}\bm{A}^T, resulting in the problem

minx,s(α/2)x22+(1/2)s22subject toAx=b+s,\begin{equation}\label{10} \begin{aligned} \min_{\bm{x},\bm{s}} &\quad (\alpha/2)\|\bm{x}\|^2_2 + (1/2)\|\bm{s}\|^2_2 \\ \text{subject to} &\quad \bm{A}\bm{x} = \bm{b} + \bm{s}, \end{aligned} \end{equation}

where we have introduced the slack variable sRm\bm{s}\in\R^m and α>0\alpha>0 is again the damping factor. This problem says that we are okay with a bit of error (“slack”) in the solution to (1)\eqref{1} if it means we can reduce the norm of x\bm{x} in exchange, with α\alpha controlling the trade-off. The optimal solution is

x=AT(AAT+α1m)1b.\begin{equation*} \bm{x} = \bm{A}^T(\bm{A}\bm{A}^T + \alpha\bm{1}_m)^{-1}\bm{b}. \end{equation*}

The matrix AT(AAT+α1m)1\bm{A}^T(\bm{A}\bm{A}^T + \alpha\bm{1}_m)^{-1} has a similar structure to the damped pseudoinverse from (5)\eqref{5} but does not look exactly the same; however, they are indeed equal for any α>0\alpha>0. To see this, first observe that

(ATA+α1n)AT=AT(AAT+α1m),\begin{equation*} (\bm{A}^T\bm{A} + \alpha\bm{1}_n)\bm{A}^T = \bm{A}^T(\bm{A}\bm{A}^T+\alpha\bm{1}_m), \end{equation*}

where we have just grouped the terms differently on each side. Left-multiplying both sides by (ATA+α1n)1(\bm{A}^T\bm{A} + \alpha\bm{1}_n)^{-1} and right-multiplying both sides by (AAT+α1m)1(\bm{A}\bm{A}^T+\alpha\bm{1}_m)^{-1}, we obtain

AT(AAT+α1m)1=(ATA+α1n)1AT=A+(α),\begin{equation*} \bm{A}^T(\bm{A}\bm{A}^T+\alpha\bm{1}_m)^{-1} = (\bm{A}^T\bm{A} + \alpha\bm{1}_n)^{-1}\bm{A}^T = \bm{A}^+(\alpha), \end{equation*}

which shows that both expressions for the damped pseudoinverse are equivalent. Indeed, the Moore-Penrose pseudoinverse can be expressed as the following equivalent limits:

A+=limα0 A+(α)=limα0 AT(AAT+α1m)1=limα0 (ATA+α1n)1AT,\begin{equation*} \begin{aligned} \bm{A}^+ &= \lim_{\alpha\to0}\ \bm{A}^+(\alpha) \\ &= \lim_{\alpha\to0}\ \bm{A}^T(\bm{A}\bm{A}^T+\alpha\bm{1}_m)^{-1} \\ &= \lim_{\alpha\to0}\ (\bm{A}^T\bm{A} + \alpha\bm{1}_n)^{-1}\bm{A}^T, \end{aligned} \end{equation*}

which always exist but may not be full-rank. In contrast, the damped pseudoinverse is always full-rank for α>0\alpha>0 and its conditioning improves as α\alpha increases. However, the trade-off is that increasing α\alpha reduces the accuracy of x=A+(α)b\bm{x}=\bm{A}^+(\alpha)\bm{b} as a solution to (1)\eqref{1}.

Secondary Objectives

Instead of solving (6)\eqref{6} for the minimum-norm solution of (1)\eqref{1}, we may want to find the x\bm{x} satisfying (1)\eqref{1} that is closest to some other vector yRn\bm{y}\in\mathbb{R}^n. That is, we want to choose the x\bm{x} satisfying

minx(1/2)xy22subject toAx=b.\begin{equation}\label{11} \begin{aligned} \min_{\bm{x}} &\quad (1/2)\|\bm{x}-\bm{y}\|^2_2 \\ \text{subject to} &\quad \bm{A}\bm{x} = \bm{b}. \end{aligned} \end{equation}

Going through the same process with the Lagrangian as above, we obtain the solution

x=A+b+(1nA+A)y,\begin{equation}\label{12} \bm{x} = \bm{A}^+\bm{b} + (\bm{1}_n-\bm{A}^+\bm{A})\bm{y}, \end{equation}

which of course reduces to (8)\eqref{8} when y=0\bm{y}=\bm{0}. So what’s the deal with the matrix

P(A):=1nA+A?\begin{equation*} \bm{P}(\bm{A}) := \bm{1}_n-\bm{A}^+\bm{A}? \end{equation*}

Nullspace Projection

The matrix P(A)\bm{P}(\bm{A}) is called a nullspace projector because it projects vectors onto the nullspace N(A)\mathcal{N}(\bm{A}). To see this, let x=P(A)y\bm{x}=\bm{P}(\bm{A})\bm{y}. Then we have

Ax=AP(A)y=A(1nA+A)y=(AA)y=0,\begin{equation*} \bm{A}\bm{x} = \bm{A}\bm{P}(\bm{A})\bm{y} = \bm{A}(\bm{1}_n-\bm{A}^+\bm{A})\bm{y} = (\bm{A}-\bm{A})\bm{y} = \bm{0}, \end{equation*}

where we have used the fact that the pseudoinverse obeys the identity AA+A=A\bm{A}\bm{A}^+\bm{A}=\bm{A}, and so we see that P(A)y\bm{P}(\bm{A})\bm{y} belongs to N(A)\mathcal{N}(\bm{A}) for any vector y\bm{y}. This means that no matter what value of y\bm{y} we choose, (12)\eqref{12} will always be a solution to (1)\eqref{1}.

It turns out that P(A)y\bm{P}(\bm{A})\bm{y} is in fact the closest vector in N(A)\mathcal{N}(\bm{A}) to y\bm{y} in the sense of least-squares. To see this, consider the least-squares optimization problem

minx(1/2)xy22subject toAx=0.\begin{equation*} \begin{aligned} \min_{\bm{x}} &\quad (1/2)\|\bm{x}-\bm{y}\|^2_2 \\ \text{subject to} &\quad \bm{A}\bm{x} = \bm{0}. \end{aligned} \end{equation*}

This is the same problem as (11)\eqref{11} with b=0\bm{b}=\bm{0}, so the solution is simply x=P(A)y\bm{x}=\bm{P}(\bm{A})\bm{y} — the nullspace projector!

Secondary Objectives with Damping

If we modify (11)\eqref{11} with a slack variable and damping factor similar to (10)\eqref{10}, we obtain the problem

minx,s(α/2)xy22+(1/2)s22subject toAx=b+s,\begin{equation*} \begin{aligned} \min_{\bm{x},\bm{s}} &\quad (\alpha/2)\|\bm{x}-\bm{y}\|^2_2 + (1/2)\|\bm{s}\|^2_2 \\ \text{subject to} &\quad \bm{A}\bm{x} = \bm{b} + \bm{s}, \end{aligned} \end{equation*}

which has optimal value

x=A+(α)b+P(A,α)y,\begin{equation*} \bm{x} = \bm{A}^+(\alpha)\bm{b} + \bm{P}(\bm{A},\alpha)\bm{y}, \end{equation*}

where

P(A,α):=1nA+(α)A\begin{equation*} \bm{P}(\bm{A},\alpha) := \bm{1}_n-\bm{A}^+(\alpha)\bm{A} \end{equation*}

is the damped (and therefore approximate) nullspace projector.

Robot Manipulator Control

Let’s see how we can apply some of these ideas to robotics. Suppose we have a robot arm with nn joints. The relationship between the velocity of the joints (i.e., the arm’s motor speeds) and the velocity of the robot’s hand in Cartesian space is given by the linear equation

Jv=ξ,\begin{equation}\label{13} \bm{J}\bm{v} = \bm{\xi}, \end{equation}

where JRm×n\bm{J}\in\mathbb{R}^{m\times n} is known as the Jacobian matrix (and in general depends on the robot’s current configuration), vRn\bm{v}\in\mathbb{R}^n is the vector of joint velocities, and ξRm\bm{\xi}\in\mathbb{R}^m is the Cartesian velocity vector.

Many modern robot arms are redundant, which means that m<nm<n and (13)\eqref{13} has many solutions (typically m=6m=6, representing the three-dimensional linear and angular velocities). Thus we can choose from a set of joint velocities that all achieve (approximately) the same Cartesian velocity:

v=J+(α)ξ+P(J,α)v0,\begin{equation}\label{14} \bm{v} = \bm{J}^+(\alpha)\bm{\xi} + \bm{P}(\bm{J},\alpha)\bm{v}_0, \end{equation}

where we use a damping factor α>0\alpha>0 to avoid singularities and we can design v0\bm{v}_0 to accomplish some secondary objective (see this paper for more details). Common choices for the secondary objective include steering toward a given “home” configuration, avoiding configurations that cause the Jacobian to lose rank, or avoiding collisions with obstacles.

Quadratic Programming

These days it is common to solve an optimization problem online in the control loop rather than use the analytic solution (14)\eqref{14}, so that we can include inequality constraints like joint limits. That is, at each control step we obtain v\bm{v} by solving a problem like

minv(1/2)vv022subject toJv=ξvminvvmax,\begin{equation}\label{15} \begin{aligned} \min_{\bm{v}} &\quad (1/2)\|\bm{v}-\bm{v}_0\|^2_2 \\ \text{subject to} &\quad \bm{J}\bm{v} = \bm{\xi} \\ &\quad \bm{v}_{\min} \leq \bm{v} \leq \bm{v}_{\max}, \end{aligned} \end{equation}

which is a quadratic program because it has a quadratic objective and affine constraints. (Convex) quadratic programs can be solved efficiently with a variety of mature off-the-shelf solvers (see qpsolvers for a list of solvers and to easily try them out in Python).

We could of course modify (15)\eqref{15} to add slack and damping similar to the problems earlier in the post. A slightly different approach is to instead modify the problem to

minv,s(α/2)vv022+ssubject toJv=(1s)ξ0s1vminvvmax,\begin{equation*} \begin{aligned} \min_{\bm{v},s} &\quad (\alpha/2)\|\bm{v}-\bm{v}_0\|^2_2 + s \\ \text{subject to} &\quad \bm{J}\bm{v} = (1-s)\bm{\xi} \\ &\quad 0 \leq s \leq 1 \\ &\quad \bm{v}_{\min} \leq \bm{v} \leq \bm{v}_{\max}, \end{aligned} \end{equation*}

which ensures the resulting Cartesian velocity is always in the desired direction but can scale it down to make v\bm{v} closer to v0\bm{v}_0, where (1s)(1-s) is the scaling factor and α>0\alpha>0 controls the trade-off. When s=0s=0 we recover (15)\eqref{15}, but unlike (15)\eqref{15}, this problem is always feasible (with solution (v,s)=(0,1)(\bm{v},s)=(\bm{0},1)).

Thanks to Abhishek Goudar for reading an early draft of this.