[1]:

%run ../initscript.py
toggle()

[1]:

Conic Programming¶

Formulation¶

Let \(A \in \Re^{n \times m}\), \(b \in \Re^m\) and \(c \in \Re^n\). A standard conic programming problem takes the form

\begin{align} p^\ast = \min_{x \in \Re^n} c^\intercal x : Ax = b, ~ x \in K, \nonumber \end{align}

where \(K \subseteq \Re^n\) is a convex cone. That is, \(K\) is a convex set with the property that for all \(x \in K\), \(\lambda x \in K\) for all \(\lambda > 0\).

Semi-definite Programming (SDP)¶

Given symmetric matrics \(C, A^\ell \in \Re^{n \times n}\) and \(b_\ell \in \Re^m\). Consider the SDP in standard form

\begin{align} p^* = \min_{X} \langle C, X\rangle : \langle A^\ell, ~X\rangle = b_\ell, ~\ell=1, \ldots, L, ~X \succeq 0 \nonumber \end{align}

where \(\langle A, B\rangle = \text{Tr} (A^\intercal B)\).

SDP is a special case of conic programming because

\begin{align*} \langle C, X\rangle &= \sum_{i=1}^{n} \sum_{j=1}^{n} C_{ij} X_{ji} \\ \langle A^\ell, X\rangle &= \sum_{i=1}^{n} \sum_{j=1}^{n} A^\ell_{ij} X_{ji} \\ X \in \text{ cone } K &= \{ X \in \Re^{n \times n}: v^\intercal X v \ge 0,\ \forall v \in \Re^n \} \end{align*}

Second-Order Cone Programming (SOCP)¶

A standard form for the SOCP model is

\begin{align*} p^\ast = \min_{x \in \mathbb{R}^n} c^\intercal x : Ax = b, \lVert C_\ell x + d_\ell \rVert_2 \le e^\intercal_\ell x + f_\ell,\ \ell = 1, \ldots, L. \end{align*}

SOCP is a special case of conic programming with cone

\begin{align*} \mathcal{L}^{n+1} = \left\{ (y,t) | \lVert y \rVert_2 \le t \right\} \end{align*}

and \((C_\ell x + d_\ell, e^\intercal_\ell x + f_\ell) \in \mathcal{L}^{n+1}\).

SOCP is a special case of SDP.

[2]:

hide_comment()

[2]:

show comment

Let us use Schur complemets to show that SOCP is a special case of SDP.

Schur complement: Given a symmetric block matrix \begin{align*} X = \left( \begin{array}{ll} A_{p\times p} & B \\ B^\intercal & C_{q\times q} \end{array} \right) \end{align*}

with det\((A) \neq 0\), the matrix \(S = C - B^\intercal A^{-1} B\) is the Schur complement of \(A\) in \(X\).

Lemma: If \(A \succ 0\), then \(X \succeq 0 \Leftrightarrow S \succeq 0\).

Proof: Let \(f_v = \min_u f(u, v)\) where

\begin{align*} f(u, v) = \left( u^\intercal, v^\intercal \right) \left( \begin{array}{ll} A_{p\times p} & B \\ B^\intercal & C_{q\times q} \end{array} \right) \left( \begin{array}{l} u \\ v \end{array} \right) = u^\intercal A u + 2 v^\intercal B^\intercal u + v^\intercal C v. \end{align*}

Since \(A \succ 0\), \(f\) is strictly convex in \(u\) and it exists a unique global optimal as follows

\begin{align*} \frac{\partial f}{\partial u} & = 2 A u + 2 B v = 0 \Rightarrow u^* = - A^{-1} B v \\ f_v &= v^\intercal B^\intercal A^{-1} B v - 2 v^\intercal B^\intercal A^{-1} B v + v^\intercal C v \\ &= v^\intercal (C - B^\intercal A^{-1} B) v = v^\intercal S v \end{align*}

(\(\Rightarrow\)) If \(S \not\succeq 0\), then \(\exists v_0\) such that \(v_0^\intercal S v_0 < 0 \Rightarrow f_{v_0} < 0\). We have a contradiction because \(f_{v_0} = z^\intercal X z < 0\) with

\begin{align*} z = \left( \begin{array}{l} u \\ v \end{array} \right) = \left( \begin{array}{c} - A^{-1} B v_0 \\ v_0 \end{array} \right) \end{align*}

(\(\Leftarrow\)) If \(S \succeq 0\), then any \((u, v)\) gives

\begin{align*} \left( \begin{array}{l} u \\ v \end{array} \right)^\intercal X \left( \begin{array}{l} u \\ v \end{array} \right) \ge f_v = v^\intercal S v \ge 0 \Rightarrow X \succeq 0 \end{align*}

■

Now, we assume

\begin{align} e^\intercal_\ell x + f_\ell > 0 \nonumber \end{align}

otherwise \(C_\ell x + d_\ell = 0\) and SOCP degenerates to LP.

We can write the constraint

\begin{align*} \lVert C_\ell x + d_\ell \rVert_2 \le e^\intercal_\ell x + f_\ell \end{align*}

as

\begin{align*} \left( \begin{array}{cc} (e^\intercal_\ell x + f_\ell) \texttt{I} & C_\ell x + d_\ell\\ (C_\ell x + d_\ell)^\intercal & e^\intercal_\ell x + f_\ell \end{array} \right) \succeq 0 \end{align*}

Because

\begin{align*} \left( \begin{array}{cc} (e^\intercal_\ell x + f_\ell) \texttt{I} & C_\ell x + d_\ell\\ (C_\ell x + d_\ell)^\intercal & e^\intercal_\ell x + f_\ell \end{array} \right) \succeq 0 &\Leftrightarrow (e^\intercal_\ell x + f_\ell) - (C_\ell x + d_\ell)^\intercal \frac{1}{e^\intercal_\ell x + f_\ell} (C_\ell x + d_\ell) \ge 0 \\ & \Leftrightarrow \lVert C_\ell x + d_\ell \rVert_2^2 \le (e^\intercal_\ell x + f_\ell)^2 \\ & \Leftrightarrow \lVert C_\ell x + d_\ell \rVert_2 \le e^\intercal_\ell x + f_\ell, \end{align*}

where the last \(\Leftrightarrow\) holds because both terms are positive. Therefore, SOCP is a special case of SDP.

\(\blacksquare\)

Convex quadratic constrained quadratic programming (QCQP) is a special case of SOCP with \(e_\ell = 0\).

[3]:

hide_comment()

[3]:

show comment

We show that QCQP is a special case of SOCP

Proof: First, we show that we can reformulation Hyperbolic Constraints as Second-Order Cone Constraints as follows:

\begin{align*} u^\intercal u \le y z, y \ge 0, z \ge 0 \Leftrightarrow \left\lVert \left(\begin{array}{c} 2u \\ y - z \end{array} \right) \right\rVert_2 \le y+z, y \ge 0, z \ge 0 \end{align*}

\(\Leftrightarrow\) holds because

\begin{align*} 4 u^\intercal u \le 4 y z = (y+z)^2 - (y-z)^2 & \Leftrightarrow 4 u^\intercal u + (y-z)^2 \le (y+z)^2 \\ & \Leftrightarrow \sqrt{4 u^\intercal u + (y-z)^2} \le y+z \\ & \Leftrightarrow \left\lVert \left(\begin{array}{c} 2u \\ y - z \end{array} \right) \right\rVert_2 \le y+z \end{align*}

Then, consider a QCQP

\begin{align*} \min_x x^\intercal Q x + c^\intercal x: Ax = b,\ x \ge 0 \end{align*}

is equivalent to

\begin{align*} \min_x t + c^\intercal x: Ax = b,\ t \ge x^\intercal Q x,\ x \ge 0, \end{align*}

we only need to show that the quadratic constraint \(t \ge x^\intercal Q x\) is a SOC constraint. Since \(Q\) is positive semidefinite, Cholesky decomposition gives \(Q = L L^\intercal\). With \(u = L^\intercal x\)

\begin{align*} t \ge x^\intercal Q x &\Leftrightarrow t \cdot 1 \ge u^\intercal u,\ t \ge 0 \\ &\Leftrightarrow \left\lVert \left(\begin{array}{c} 2u \\ t - 1 \end{array} \right) \right\rVert_2 \le t+1, t \ge 0 \end{align*}

■

However, SOCP cannot, in general, be cast as QCQP. One may be tempted to square the SCO constraints and obtain a quadratic constraint of the form

\begin{align*} \lVert C_\ell x + d_\ell \rVert_2^2 \le (e^\intercal_\ell x + f_\ell)^2, \ e^\intercal_\ell x + f_\ell \ge 0. \end{align*}

While the constraints are equivalent to the original SOC constraint, the first may not be convex. Consider the quadratic terms in each side of the inequality. We have

\begin{align*} x^\intercal C^\intercal_\ell C_\ell x + \cdots \le x^\intercal e_\ell e_\ell^\intercal x + \cdots \end{align*}

It is convex only if \(C^\intercal_\ell C_\ell - e_\ell e_\ell^\intercal \succeq 0\).

\(\blacksquare\)

Duality¶

Semi-definite Programming (SDP)¶

Consider an SDP \begin{align} p^* = \min_{X} \langle C, X\rangle : \langle A^\ell, ~X\rangle = b_\ell, ~\ell=1, \ldots, L, ~X \succeq 0 \nonumber \end{align}

The dual problem is also an SDP as follows \begin{align} d^\ast = \max_{\nu} \nu^\intercal b: \sum_{\ell=1}^{L} \nu_\ell A^\ell \preceq C \nonumber \end{align}

Lagrangian¶

\begin{align} \mathcal{L}(X, \lambda, \nu) & = \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) - \lambda \cdot \lambda_{\min} (X) \nonumber \\ & = \nu^\intercal b + \left\langle C - \sum_{\ell=1}^{L} \nu_\ell A^\ell, X \right\rangle - \lambda \cdot \lambda_{\min} (X) \nonumber \end{align}

where \(\nu \in \mathbb{R}^m\) and \(\lambda \ge 0\) are the dual variables, \(\lambda_{\min}(X)\) is the smallest eigenvalue of \(X\). We have used \(\lambda_{\min}(X) \ge 0\) is equivalent to \(X \succeq 0\).

\begin{align*} p^\ast & = \min_{X} \max_{\nu, \lambda \ge 0} \mathcal{L}(X, \lambda, \nu)\\ & \ge \max_{\nu, \lambda \ge 0} \min_{X} \mathcal{L}(X, \lambda, \nu)\\ & = d^\ast \end{align*}

[4]:

hide_comment()

[4]:

show comment

We show the min-max and max-min relationship

Proof

For any \(X^0, \lambda^0, \nu^0\),

\begin{align*} &&\ \mathcal{L}(X^0, \lambda^0, \nu^0) &\ge \min_{X} \mathcal{L}(X, \lambda^0, \nu^0) \\ \Rightarrow &&\ \max_{\nu, \lambda \ge 0} \mathcal{L}(X^0, \lambda, \nu) &\ge \max_{\nu, \lambda \ge 0} \min_{X} \mathcal{L}(X, \lambda, \nu) \\ \Rightarrow &&\ \min_{X} \max_{\nu, \lambda \ge 0} \mathcal{L}(X, \lambda, \nu) &\ge \max_{\nu, \lambda \ge 0} \min_{X} \mathcal{L}(X, \lambda, \nu) \\ \end{align*}

■

We show that \(\lambda_{\min}(\cdot)\) can be represented as an optimization problem.

Lemma: Let \(A\) be a symmetric matrix and \(\lambda_{\min}(A)\) is the smallest eigenvalue of \(A\). We have

\begin{align} \lambda_{\min}(A) = \min_{Y} \langle Y, A\rangle: Y \succeq 0,\ \textbf{Tr}Y = 1 \nonumber \end{align}

Proof \(A = P \Lambda P^\intercal\) where \(P\) is orthogonal matrix and \(\Lambda = \text{diag}(\lambda_1,\ldots,\lambda_n)\)

\begin{align*} \text{Tr}(YA) = \text{Tr}(Y P \Lambda P^\intercal) = \text{Tr}(P^\intercal Y P \Lambda) = \text{Tr}(Y^{\prime} \Lambda ) \end{align*}

Since \(Y^{\prime} \succeq 0\) and \(\textbf{Tr}Y^{\prime} = 1\), we can assume that \(A\) is diagonal without loss of generality.

When \(A\) is diagonal, \((YA)_{ii} = Y_{ii} A_{ii}\) and \(\text{Tr}(YA) = \sum_{i} Y_{ii} A_{ii}\). Then, we can restrict \(Y\) to be diagonal as well. So

\begin{align*} \min_{Y} \langle Y, A\rangle: Y \succeq 0,\ \textbf{Tr}Y = 1 \Leftrightarrow \min_{Y} \sum_{i} Y_i \lambda_i : Y_i \ge 0,\ \sum_{i} Y_i = 1, \end{align*}

which is \(\lambda_{\min}(A)\). The lemma is proved.

■

Lemma:

\begin{align*} p^\ast = \min_{X} \max_{\nu, \lambda \ge 0} \mathcal{L}(X, \lambda, \nu) \end{align*}

Proof:

\begin{align} \min_{X} \max_{\nu, \lambda \ge 0} \mathcal{L}(X, \lambda, \nu)& = \min_{X} \max_{\nu, \lambda \ge 0} \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) - \lambda \cdot \lambda_{\min} (X) \nonumber \\ &= \min_{X} \left\{ \begin{aligned} &\langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) && \text{if } \lambda_{\min} (X) \ge 0 \Rightarrow X \succeq 0 \\ & \infty && \text{otherwise} \end{aligned} \right. \nonumber \\ &=\min_{X \succeq 0} \max_{\nu} \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) \nonumber \\ &= \min_{X \succeq 0} \left\{\begin{aligned} &\langle C,X\rangle && \text{if } \langle A^\ell, X\rangle = b_\ell, ~\ell=1, \ldots, L \\ & -\infty && \text{otherwise} \end{aligned} \right. \nonumber \\ &= \min_{X \succeq 0} \langle C, X\rangle : \langle A^\ell, ~X\rangle = b_\ell, ~\ell=1, \ldots, \ell \nonumber \\ &= p^\ast \nonumber \end{align}

The fourth equality holds because an optimal solution may be \(v_i = 0\) if \(b_i - \langle A_i, X\rangle = 0\) and \(v_i = -t \cdot \text{sign}(b_i - \langle A_i, X\rangle)\) otherwise with \(t \to \infty\).

■

Lemma:

\begin{align*} d^\ast=\max_{\nu, \lambda \ge 0} \min_{X} \mathcal{L}(X, \lambda, \nu) \end{align*}

Proof:

For notational convenience, we let

\begin{align} Z &= C - \sum_{\ell=1}^{L} \nu_\ell A^\ell \nonumber \\ G(\lambda, Z) &= \min_{X} ( \langle Z, X \rangle - \lambda \cdot \lambda_{\min} (X) ) \nonumber \end{align}

where \(G(\lambda, \cdot)\) is the conjugate of the concave function \(\lambda \cdot \lambda_{\min} (X)\).

The conjugate \(f^\ast\) of a function \(f\) is given by \begin{align} f^\ast(y) = \sup_{x \in \textbf{dom} f} (y^\intercal x - f(x)) \nonumber \end{align}

and

\begin{align*} G(\lambda, \cdot) &= \max_{X} ( \langle \cdot, -X \rangle + \lambda \cdot \lambda_{\min} (X) ) \end{align*}

Suppose we can show that

\begin{align} G(\lambda, Z) = \left\{ \begin{aligned} &0 && \textrm{if } \textrm{Tr}(Z) = \lambda \ge 0, Z \succeq 0, \text{ i.e., } Z \in \mathcal{Z}(\lambda) \\ &-\infty && \textrm{otherwise} \end{aligned} \right. \nonumber \end{align}

where \(\mathcal{Z}(\lambda) = \{Z: \textrm{Tr}(Z) = \lambda \ge 0, Z \succeq 0\}\). Then,

\begin{align*} \max_{\nu, \lambda \ge 0} \min_{X} \mathcal{L}(X, \lambda, \nu) &= \max_{\nu, \lambda \ge 0} \nu^\intercal b + G(\lambda, Z) \\ &= \max_{\nu, \lambda \ge 0} \nu^\intercal b: Z = C - \sum_{i=1}^{m} \nu_i A_i \succeq 0, \textrm{Tr}(Z) = \lambda \ge 0 \\ &= \max_{\nu} \nu^\intercal b: C - \sum_{i=1}^{m} \nu_i A_i \succeq 0 \\ &= d^* \end{align*}

where \(\lambda\) can be eliminated because \(Z \succeq 0\) implies \(\textrm{Tr}(Z) \ge 0\).

Now, we will complete the proof by showing that

\begin{align} G(\lambda, Z) = \left\{ \begin{aligned} &0 && \textrm{if } \textrm{Tr}(Z) = \lambda \ge 0, Z \succeq 0, \text{ i.e., } Z \in \mathcal{Z}(\lambda) \\ &-\infty && \textrm{otherwise} \end{aligned} \right. \label{eqn:g_value} \end{align}

Because \(\lambda_{\min}(\cdot)\) can be represented as an optimization problem, we have

\begin{align} G(\lambda, Z) & = \min_{X} \left( \langle Z, X \rangle - \lambda \min_{Y \succeq 0, \textbf{Tr} Y = 1} \langle Y, X \rangle \right) = \min_{X} \max_{Y \succeq 0, \textbf{Tr} Y = 1} \langle Z - \lambda Y, X \rangle \nonumber \\ & = \min_{X} \max_{Y \succeq 0, \textbf{Tr} Y = \lambda} \langle Z - Y, X \rangle \nonumber \\ & \ge \max_{Y \succeq 0, \textbf{Tr} Y = \lambda} \min_{X \succeq 0} \langle Z - Y, X \rangle \nonumber \\ & = \max_{Y \succeq 0, \textbf{Tr} Y = \lambda} \left\{ \begin{aligned} &0 && \text{if } Z = Y, \\ &-\infty && \text{otherwise} \end{aligned} \right. \nonumber \\ & = \left\{ \begin{aligned} &0 && \text{if } \textrm{Tr}(Z) = \lambda \ge 0, Z \succeq 0, \\ &-\infty && \text{otherwise} \end{aligned} \right. \label{eqn:g} \end{align}

We prove \eqref{eqn:g_value} in two steps:

Proof of :math:`G(lambda, Z) = 0` if :math:`Z in mathcal{Z}(lambda)`

Following the definition of \(G(\lambda, Z)\), \(G(\lambda, Z) \le 0\) since \(X = 0\) is a feasible solution. We also have \(G(\lambda, Z) \ge 0\) because of \eqref{eqn:g}. So \(G(\lambda, Z) = 0\) if \(Z \in \mathcal{Z}(\lambda)\).

Proof of :math:`G(lambda, Z) = -infty` if :math:`Z notin mathcal{Z}(lambda)`

If \(\textrm{Tr}(Z) \neq \lambda\), choosing \(X = \epsilon t \texttt{I}\) with \(\epsilon = -\text{sign}(\textrm{Tr}(Z) - \lambda)\) and \(t \to +\infty\) implies

\begin{align*} G(\lambda, Z) &\le \langle Z, \epsilon t \texttt{I} \rangle - \lambda \cdot \lambda_{\min} (\epsilon t \texttt{I}) \\ & = -t |\textrm{Tr}(Z) + \lambda| \to \infty \end{align*}

If \(\textrm{Tr}(Z) = \lambda\), but \(Z \not\succeq 0\) (i.e., \(\lambda_{\max} (Z) < 0\)), let \(X = t u u^\intercal\), where \(u\) is an unit-norm (\(\lVert u^\intercal u \rVert = 1\)) eigenvector corresponding to \(\lambda_{\max} (Z)\) and \(t \to +\infty\). We have

\begin{align} G(\lambda, Z) &= \min_{X} ( \langle Z, X \rangle - \lambda \cdot \lambda_{\min} (X) ) \le \langle Z, t u u^\intercal \rangle - \lambda \cdot \lambda_{\min} (t u u^\intercal) \nonumber \\ &= t \textrm{Tr} (Z^\intercal uu^\intercal) = t \lambda_{\max}(Z) \to -\infty \nonumber \end{align}

because

\begin{align*} \textrm{Tr} (Z^\intercal uu^\intercal) &= \textrm{Tr} (uu^\intercal Z^\intercal ) = \textrm{Tr} (u (Zu)^\intercal) = \textrm{Tr} (u \lambda_{\max}(Z) u^\intercal) \\ & = \lambda_{\max}(Z) \textrm{Tr} (u u^\intercal) = \lambda_{\max}(Z) \textrm{Tr} (u^\intercal u) = \lambda_{\max}(Z) \end{align*}

and \(\lambda_{\min} (u u^\intercal) = 0\) because \(u u^\intercal\) is rank one matrix.

■

Lemma: \(\lambda_{\min}\) is concave, i.e., \(\lambda_{\min}\left(\frac{1}{2}A_1 + \frac{1}{2}A_2\right) \ge \frac{1}{2}\lambda_{\min}(A_1) + \frac{1}{2}\lambda_{\min}(A_2)\)

Proof: Let \(\mathcal{Y} = \{Y: Y \succeq 0, \textbf{Tr}Y = 1 \}\). We have \begin{align} \min_{Y \in \mathcal{Y}} \left\langle Y, \frac{1}{2}A_1 + \frac{1}{2}A_2 \right\rangle = \min_{Y \in \mathcal{Y}} \left( \frac{1}{2} \left\langle Y, A_1 \right\rangle + \frac{1}{2} \left\langle Y, A_2 \right\rangle \right) \nonumber \end{align}

Since, for any \(Y \in \mathcal{Y}\), \begin{align} && \frac{1}{2} \left\langle Y, A_1 \right\rangle + \frac{1}{2} \left\langle Y, A_2 \right\rangle &\ge \frac{1}{2} \min_{Y \in \mathcal{Y}} \left\langle Y, A_1 \right\rangle + \frac{1}{2} \min_{Y \in \mathcal{Y}} \left\langle Y, A_2 \right\rangle \nonumber \\ \Rightarrow && \min_{Y \in \mathcal{Y}} \left( \frac{1}{2} \left\langle Y, A_1 \right\rangle + \frac{1}{2} \left\langle Y, A_2 \right\rangle \right) & \ge \frac{1}{2} \min_{Y \in \mathcal{Y}} \left\langle Y, A_1 \right\rangle + \frac{1}{2} \min_{Y \in \mathcal{Y}} \left\langle Y, A_2 \right\rangle \nonumber \end{align}

\(\blacksquare\)

Conic Lagrangian¶

\begin{align} \mathcal{L}^c(X, \nu, Y) & = \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) - \langle Y,X\rangle \nonumber \\ & = \nu^\intercal b + \left\langle C - \sum_{\ell=1}^{L} \nu_\ell A^\ell, X \right\rangle - \langle Y,X\rangle \nonumber \end{align}

where now we associate a matrix dual variable \(Y \succeq 0\) to the constraint \(X \succeq 0\).

\begin{align*} p^\ast &= \min_{X \succeq 0} \max_{\nu, Y \succeq 0} \mathcal{L}^c(X, \nu, Y) \\ & \ge \max_{\nu, Y \succeq 0} \min_{X \succeq 0} \mathcal{L}^c(X, \nu, Y) \\ & = d^\ast \end{align*}

[5]:

hide_comment()

[5]:

show comment

Lemma

\begin{align*} p^\ast = \min_{X} \max_{\nu, Y \succeq 0} \mathcal{L}^c(X, \nu, Y) \end{align*}

Proof Recall the lemma that \(\lambda_{\min}(\cdot)\) is represented as an optimization problem.

We have \begin{align} \min_{Y \succeq 0} \langle Y, X\rangle = \min_{t \ge 0} \min_{Y \succeq 0, \textbf{Tr} Y = t} \langle Y, X\rangle = \min_{t \ge 0} t \lambda_{\min}(X) = \left\{ \begin{aligned} &0 && \text{if } X \succeq 0 \\ &-\infty && \text{otherwise.} \end{aligned} \right. \nonumber \end{align}

Therefore, we get \begin{align} \min_{X} \max_{\nu, Y \succeq 0} \mathcal{L}^c(X, \nu, Y) &= \min_{X} \max_{\nu, Y \succeq 0} \left( \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) - \langle Y,X\rangle \right) \nonumber \\ & = \min_{X \succeq 0} \max_{\nu} \left( \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) \right) \nonumber \\ &= \min_{X} \left\{\begin{aligned} &\langle C,X\rangle && \text{if } \langle A^\ell, X\rangle = b_\ell, ~\ell=1, \ldots, L \\ & -\infty && otherwise \end{aligned} \right. \nonumber \\ &= \min_{X} \langle C, X\rangle : \langle A^\ell, ~X\rangle = b_\ell, ~\ell=1, \ldots, \ell \nonumber \\ &= p^\ast \nonumber \end{align}

■

Lemma

\begin{align*} d^\ast = \max_{\nu, Y \succeq 0} \min_{X} \mathcal{L}^c(X, \nu, Y) \end{align*}

Proof

\begin{align} \min_{\nu, Y \succeq 0} \max_{X} \mathcal{L}^c(X, \nu, Y) &= \min_{\nu, Y \succeq 0} \max_{X} \left( \nu^\intercal b + \left\langle C - \sum_{\ell=1}^{L} \nu_\ell A^\ell, X \right\rangle - \langle Y,X\rangle \right) \nonumber \\ & = \min_{\nu} \nu^\intercal b + \min_{\nu, Y \succeq 0} \max_{X} \langle Z-Y, X \rangle \nonumber \\ & = \min_{\nu} \nu^\intercal b + \left\{ \begin{aligned} &0 && \text{if } Z \succeq 0, \\ &+\infty && \text{otherwise.} \end{aligned} \right. \nonumber \\ &= \min_{\lambda, \nu, Z} \nu^\intercal b: Z = C - \sum_{i=1}^{m} \nu_i A_i \succeq 0 \nonumber\\ &= d^\ast. \nonumber \end{align}

\(\blacksquare\)

Strong Duality¶

Strong duality can fail. Consider the example \begin{align} p^\ast = \min_{x} x_2 : \left( \begin{array}{ccc} x_2 + 1 & 0 & 0 \\ 0 & x_1 & x_2 \\ 0 & x_2 & 0 \end{array} \right) \succeq 0 \nonumber \end{align}

positive-semi definiteness implies \(x_1 \le 0\) and \(x^2_2 \le 0\). So, \(p^\ast = 0\).

The dual is \begin{align} d^\ast = \max_{Y \in S^3} - Y_{11}: Y \succeq 0, Y_{22}=0, 1-Y_{11}- 2Y_{23} = 0 \nonumber \end{align}

Since \(Y \succeq 0, Y_{22}=0\), \(Y_{23}= 0\). Hence, \(d^\ast = - Y_{11} = -1 < p^\ast\).

Strong duality holds if Slater’s condition holds for the primal problem where Slater’s condition states that the feasible region must have an interior point.

Conic Programming¶

Consider a conic programming problem

\begin{align} p^\ast = \min_{x \in \Re^n} \langle c, x \rangle : Ax = b, ~ x \in K, \nonumber \end{align}

where \(K \subseteq \Re^n\) is a convex cone.

The dual problem is

\begin{align} \max_{y \in \Re^m} \langle b, y \rangle : A^\intercal y + s = c, ~ s \in K^\ast, \nonumber \end{align}

where \(K^\ast\) is the dual cone defined by

\begin{align*} K^\ast = \{s \in \Re^n: \langle s,x\rangle \ge 0,~ \forall x \in K\}. \end{align*}

The Lagrange dual problem is

\begin{align} \inf_{x \in K} \mathcal{L}(y) &= \inf_{x \in K} \langle c, x \rangle + \langle y, b-Ax \rangle \nonumber \\ & = \langle y, b \rangle + \inf_{x \in K} \langle c, x \rangle - \langle y, Ax \rangle = \langle y, b \rangle + \inf_{x \in K} \langle c-A^\intercal y, x \rangle \nonumber \end{align}

If \(c-A^\intercal y \in K^\ast\), then \(\langle c-A^Ty, x \rangle\), so \(\inf_{x \in K} \langle c-A^\intercal y, x \rangle = 0\).
If \(c-A^\intercal y \notin K^\ast\), then \(\inf_{x \in K} \langle c-A^\intercal y, x \rangle = -\infty\).

Thus

\begin{align} \inf_{x \in K} \mathcal{L}(y) = \left\{ \begin{aligned} &\langle y, b \rangle && c-A^\intercal y \in K^\ast \\ &-\infty && c-A^\intercal y \notin K^\ast \end{aligned}\right. \nonumber \end{align}

Let \(s = c-A^Ty \in K^\ast\). Then we obtain the dual problem.

Applications¶

Eigenvalue Problem¶

For a matrix \(A \in S^n_+\) (the set of positive semi-definite matrices), we consider

\begin{align} p^\ast = \max_X \langle A, X\rangle: \textrm{Tr}(X) = 1, ~ X \succeq 0. \nonumber \end{align}

The associated Lagrangian, using the conic approach, is

\begin{align} \mathcal{L}(X, Y, \nu) = \langle A, X\rangle + \nu (1 - \textrm{Tr}(X)) + \langle Y, X\rangle \nonumber \end{align}

with the matrix dual variable \(Y \succeq 0\), and \(\nu \in \mathbb{R}\) is free. The dual problem is

\begin{align} d^\ast = \min_{\nu, Y} : Y + A = \lambda I, ~ Y \succeq 0. \nonumber \end{align}

Both the primal and dual problems are strictly feasible, so \(p^\ast = d^\ast\).

Robust Linear Programming¶

\begin{align*} \min \ c^\intercal x: \text{Prob}(a_i^\intercal x \le b_i) \ge \eta, \ i=1,\ldots, n \end{align*}

where coefficient vectors are IID \(a_i \sim \mathcal{N}(\bar{a}_i, \Sigma_i)\). For fixed \(x\), \(a^\intercal_i x\) is \(\mathcal{N}(\bar{a}^\intercal_i x, x^\intercal \Sigma_i x)\). We have

\begin{align*} \text{Prob}(\bar{a}^\intercal_i x + x^\intercal \Sigma_i x Z \le b) & \ge \eta \Leftrightarrow \text{Prob}\left(Z \le \frac{b - \bar{a}^\intercal_i x}{x^\intercal \Sigma_i x }\right) \ge \eta \\ \Phi^{-1}(\eta) & \le \frac{b - \bar{a}^\intercal_i x}{x^\intercal \Sigma_i x } \Leftrightarrow \bar{a}^\intercal_i x + \Phi^{-1}(\eta) \lVert \Sigma_i^{1/2} x \rVert_2 \le b \end{align*}

it is equivalent to

\begin{align*} \min \ & c^\intercal x \\ \text{s.t. } & \bar{a}^\intercal_i x + \Phi^{-1}(\eta) \lVert \Sigma_i^{1/2} x \rVert_2 \le b, \ i=1,\ldots, n \end{align*}

robust LP is an SOCP for \(\eta \ge 0.5\) because \(\Phi^{-1}(\eta) \ge 0\)
robust LP is a LP for \(\eta = 0.5\) because \(\Phi^{-1}(\eta) = 0\)

Distributionally Robust Optimization¶

\begin{align*} \max_{F} \ & E_{F}(h(x, \xi)) \\ \text{s.t.} \ & \int_{\mathcal{S}} d F(\xi) = 1 \\ & (\mathbb{E}(\xi) - \mu)^\intercal \Sigma^{-1} (\mathbb{E}(\xi) - \mu) \le \gamma_1 \\ & \mathbb{E}((\xi-\mu)(\xi-\mu)^\intercal) \preceq \gamma_2 \Sigma \end{align*}

is equivalent to an SDP

\begin{align*} \max_{F} \ & \int_{\mathcal{S}} h(x, \xi) d F(\xi) \\ \text{s.t.} \ & \int_{\mathcal{S}} d F(\xi) = 1 \\ & \int_{\mathcal{S}} \left[ \begin{array}{cc} \Sigma & \xi-\mu \\ \xi-\mu & \gamma_1 \end{array} \right] d F(\xi) \succeq 0 \quad \text{(Schur complement)}\\ & \int_{\mathcal{S}} (\xi-\mu)(\xi-\mu)^\intercal d F(\xi) \preceq \gamma_2 \Sigma \end{align*}

We refer to (Delage and Ye 2010) for its dual formulation.