[1]:
%run ../initscript.py
toggle()
[1]:
Conic Programming¶
Formulation¶
Let \(A \in \Re^{n \times m}\), \(b \in \Re^m\) and \(c \in \Re^n\). A standard conic programming problem takes the form
\begin{align} p^\ast = \min_{x \in \Re^n} c^\intercal x : Ax = b, ~ x \in K, \nonumber \end{align}
where \(K \subseteq \Re^n\) is a convex cone. That is, \(K\) is a convex set with the property that for all \(x \in K\), \(\lambda x \in K\) for all \(\lambda > 0\).
Semi-definite Programming (SDP)¶
Given symmetric matrics \(C, A^\ell \in \Re^{n \times n}\) and \(b_\ell \in \Re^m\). Consider the SDP in standard form
\begin{align} p^* = \min_{X} \langle C, X\rangle : \langle A^\ell, ~X\rangle = b_\ell, ~\ell=1, \ldots, L, ~X \succeq 0 \nonumber \end{align}
where \(\langle A, B\rangle = \text{Tr} (A^\intercal B)\).
SDP is a special case of conic programming because
\begin{align*} \langle C, X\rangle &= \sum_{i=1}^{n} \sum_{j=1}^{n} C_{ij} X_{ji} \\ \langle A^\ell, X\rangle &= \sum_{i=1}^{n} \sum_{j=1}^{n} A^\ell_{ij} X_{ji} \\ X \in \text{ cone } K &= \{ X \in \Re^{n \times n}: v^\intercal X v \ge 0,\ \forall v \in \Re^n \} \end{align*}
Second-Order Cone Programming (SOCP)¶
A standard form for the SOCP model is
\begin{align*} p^\ast = \min_{x \in \mathbb{R}^n} c^\intercal x : Ax = b, \lVert C_\ell x + d_\ell \rVert_2 \le e^\intercal_\ell x + f_\ell,\ \ell = 1, \ldots, L. \end{align*}
SOCP is a special case of conic programming with cone
\begin{align*} \mathcal{L}^{n+1} = \left\{ (y,t) | \lVert y \rVert_2 \le t \right\} \end{align*}
and \((C_\ell x + d_\ell, e^\intercal_\ell x + f_\ell) \in \mathcal{L}^{n+1}\).
SOCP is a special case of SDP.
[2]:
hide_comment()
[2]:
Let us use Schur complemets to show that SOCP is a special case of SDP.
Schur complement: Given a symmetric block matrix \begin{align*} X = \left( \begin{array}{ll} A_{p\times p} & B \\ B^\intercal & C_{q\times q} \end{array} \right) \end{align*}
with det\((A) \neq 0\), the matrix \(S = C - B^\intercal A^{-1} B\) is the Schur complement of \(A\) in \(X\).
Lemma: If \(A \succ 0\), then \(X \succeq 0 \Leftrightarrow S \succeq 0\).
Proof: Let \(f_v = \min_u f(u, v)\) where
\begin{align*} f(u, v) = \left( u^\intercal, v^\intercal \right) \left( \begin{array}{ll} A_{p\times p} & B \\ B^\intercal & C_{q\times q} \end{array} \right) \left( \begin{array}{l} u \\ v \end{array} \right) = u^\intercal A u + 2 v^\intercal B^\intercal u + v^\intercal C v. \end{align*}
Since \(A \succ 0\), \(f\) is strictly convex in \(u\) and it exists a unique global optimal as follows
\begin{align*} \frac{\partial f}{\partial u} & = 2 A u + 2 B v = 0 \Rightarrow u^* = - A^{-1} B v \\ f_v &= v^\intercal B^\intercal A^{-1} B v - 2 v^\intercal B^\intercal A^{-1} B v + v^\intercal C v \\ &= v^\intercal (C - B^\intercal A^{-1} B) v = v^\intercal S v \end{align*}
(\(\Rightarrow\)) If \(S \not\succeq 0\), then \(\exists v_0\) such that \(v_0^\intercal S v_0 < 0 \Rightarrow f_{v_0} < 0\). We have a contradiction because \(f_{v_0} = z^\intercal X z < 0\) with
\begin{align*} z = \left( \begin{array}{l} u \\ v \end{array} \right) = \left( \begin{array}{c} - A^{-1} B v_0 \\ v_0 \end{array} \right) \end{align*}
(\(\Leftarrow\)) If \(S \succeq 0\), then any \((u, v)\) gives
\begin{align*} \left( \begin{array}{l} u \\ v \end{array} \right)^\intercal X \left( \begin{array}{l} u \\ v \end{array} \right) \ge f_v = v^\intercal S v \ge 0 \Rightarrow X \succeq 0 \end{align*}
■
Now, we assume
\begin{align} e^\intercal_\ell x + f_\ell > 0 \nonumber \end{align}
otherwise \(C_\ell x + d_\ell = 0\) and SOCP degenerates to LP.
We can write the constraint
\begin{align*} \lVert C_\ell x + d_\ell \rVert_2 \le e^\intercal_\ell x + f_\ell \end{align*}
as
\begin{align*} \left( \begin{array}{cc} (e^\intercal_\ell x + f_\ell) \texttt{I} & C_\ell x + d_\ell\\ (C_\ell x + d_\ell)^\intercal & e^\intercal_\ell x + f_\ell \end{array} \right) \succeq 0 \end{align*}
Because
\begin{align*} \left( \begin{array}{cc} (e^\intercal_\ell x + f_\ell) \texttt{I} & C_\ell x + d_\ell\\ (C_\ell x + d_\ell)^\intercal & e^\intercal_\ell x + f_\ell \end{array} \right) \succeq 0 &\Leftrightarrow (e^\intercal_\ell x + f_\ell) - (C_\ell x + d_\ell)^\intercal \frac{1}{e^\intercal_\ell x + f_\ell} (C_\ell x + d_\ell) \ge 0 \\ & \Leftrightarrow \lVert C_\ell x + d_\ell \rVert_2^2 \le (e^\intercal_\ell x + f_\ell)^2 \\ & \Leftrightarrow \lVert C_\ell x + d_\ell \rVert_2 \le e^\intercal_\ell x + f_\ell, \end{align*}
where the last \(\Leftrightarrow\) holds because both terms are positive. Therefore, SOCP is a special case of SDP.
\(\blacksquare\)
Convex quadratic constrained quadratic programming (QCQP) is a special case of SOCP with \(e_\ell = 0\).
[3]:
hide_comment()
[3]:
We show that QCQP is a special case of SOCP
Proof: First, we show that we can reformulation Hyperbolic Constraints as Second-Order Cone Constraints as follows:
\begin{align*} u^\intercal u \le y z, y \ge 0, z \ge 0 \Leftrightarrow \left\lVert \left(\begin{array}{c} 2u \\ y - z \end{array} \right) \right\rVert_2 \le y+z, y \ge 0, z \ge 0 \end{align*}
\(\Leftrightarrow\) holds because
\begin{align*} 4 u^\intercal u \le 4 y z = (y+z)^2 - (y-z)^2 & \Leftrightarrow 4 u^\intercal u + (y-z)^2 \le (y+z)^2 \\ & \Leftrightarrow \sqrt{4 u^\intercal u + (y-z)^2} \le y+z \\ & \Leftrightarrow \left\lVert \left(\begin{array}{c} 2u \\ y - z \end{array} \right) \right\rVert_2 \le y+z \end{align*}
Then, consider a QCQP
\begin{align*} \min_x x^\intercal Q x + c^\intercal x: Ax = b,\ x \ge 0 \end{align*}
is equivalent to
\begin{align*} \min_x t + c^\intercal x: Ax = b,\ t \ge x^\intercal Q x,\ x \ge 0, \end{align*}
we only need to show that the quadratic constraint \(t \ge x^\intercal Q x\) is a SOC constraint. Since \(Q\) is positive semidefinite, Cholesky decomposition gives \(Q = L L^\intercal\). With \(u = L^\intercal x\)
\begin{align*} t \ge x^\intercal Q x &\Leftrightarrow t \cdot 1 \ge u^\intercal u,\ t \ge 0 \\ &\Leftrightarrow \left\lVert \left(\begin{array}{c} 2u \\ t - 1 \end{array} \right) \right\rVert_2 \le t+1, t \ge 0 \end{align*}
■
However, SOCP cannot, in general, be cast as QCQP. One may be tempted to square the SCO constraints and obtain a quadratic constraint of the form
\begin{align*} \lVert C_\ell x + d_\ell \rVert_2^2 \le (e^\intercal_\ell x + f_\ell)^2, \ e^\intercal_\ell x + f_\ell \ge 0. \end{align*}
While the constraints are equivalent to the original SOC constraint, the first may not be convex. Consider the quadratic terms in each side of the inequality. We have
\begin{align*} x^\intercal C^\intercal_\ell C_\ell x + \cdots \le x^\intercal e_\ell e_\ell^\intercal x + \cdots \end{align*}
It is convex only if \(C^\intercal_\ell C_\ell - e_\ell e_\ell^\intercal \succeq 0\).
\(\blacksquare\)
Duality¶
Semi-definite Programming (SDP)¶
Consider an SDP \begin{align} p^* = \min_{X} \langle C, X\rangle : \langle A^\ell, ~X\rangle = b_\ell, ~\ell=1, \ldots, L, ~X \succeq 0 \nonumber \end{align}
The dual problem is also an SDP as follows \begin{align} d^\ast = \max_{\nu} \nu^\intercal b: \sum_{\ell=1}^{L} \nu_\ell A^\ell \preceq C \nonumber \end{align}
Lagrangian¶
\begin{align} \mathcal{L}(X, \lambda, \nu) & = \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) - \lambda \cdot \lambda_{\min} (X) \nonumber \\ & = \nu^\intercal b + \left\langle C - \sum_{\ell=1}^{L} \nu_\ell A^\ell, X \right\rangle - \lambda \cdot \lambda_{\min} (X) \nonumber \end{align}
where \(\nu \in \mathbb{R}^m\) and \(\lambda \ge 0\) are the dual variables, \(\lambda_{\min}(X)\) is the smallest eigenvalue of \(X\). We have used \(\lambda_{\min}(X) \ge 0\) is equivalent to \(X \succeq 0\).
\begin{align*} p^\ast & = \min_{X} \max_{\nu, \lambda \ge 0} \mathcal{L}(X, \lambda, \nu)\\ & \ge \max_{\nu, \lambda \ge 0} \min_{X} \mathcal{L}(X, \lambda, \nu)\\ & = d^\ast \end{align*}
[4]:
hide_comment()
[4]:
We show the min-max and max-min relationship
Proof
For any \(X^0, \lambda^0, \nu^0\),
\begin{align*} &&\ \mathcal{L}(X^0, \lambda^0, \nu^0) &\ge \min_{X} \mathcal{L}(X, \lambda^0, \nu^0) \\ \Rightarrow &&\ \max_{\nu, \lambda \ge 0} \mathcal{L}(X^0, \lambda, \nu) &\ge \max_{\nu, \lambda \ge 0} \min_{X} \mathcal{L}(X, \lambda, \nu) \\ \Rightarrow &&\ \min_{X} \max_{\nu, \lambda \ge 0} \mathcal{L}(X, \lambda, \nu) &\ge \max_{\nu, \lambda \ge 0} \min_{X} \mathcal{L}(X, \lambda, \nu) \\ \end{align*}
■
We show that \(\lambda_{\min}(\cdot)\) can be represented as an optimization problem.
Lemma: Let \(A\) be a symmetric matrix and \(\lambda_{\min}(A)\) is the smallest eigenvalue of \(A\). We have
\begin{align} \lambda_{\min}(A) = \min_{Y} \langle Y, A\rangle: Y \succeq 0,\ \textbf{Tr}Y = 1 \nonumber \end{align}
Proof \(A = P \Lambda P^\intercal\) where \(P\) is orthogonal matrix and \(\Lambda = \text{diag}(\lambda_1,\ldots,\lambda_n)\)
\begin{align*} \text{Tr}(YA) = \text{Tr}(Y P \Lambda P^\intercal) = \text{Tr}(P^\intercal Y P \Lambda) = \text{Tr}(Y^{\prime} \Lambda ) \end{align*}
Since \(Y^{\prime} \succeq 0\) and \(\textbf{Tr}Y^{\prime} = 1\), we can assume that \(A\) is diagonal without loss of generality.
When \(A\) is diagonal, \((YA)_{ii} = Y_{ii} A_{ii}\) and \(\text{Tr}(YA) = \sum_{i} Y_{ii} A_{ii}\). Then, we can restrict \(Y\) to be diagonal as well. So
\begin{align*} \min_{Y} \langle Y, A\rangle: Y \succeq 0,\ \textbf{Tr}Y = 1 \Leftrightarrow \min_{Y} \sum_{i} Y_i \lambda_i : Y_i \ge 0,\ \sum_{i} Y_i = 1, \end{align*}
which is \(\lambda_{\min}(A)\). The lemma is proved.
■
Lemma:
\begin{align*} p^\ast = \min_{X} \max_{\nu, \lambda \ge 0} \mathcal{L}(X, \lambda, \nu) \end{align*}
Proof:
\begin{align} \min_{X} \max_{\nu, \lambda \ge 0} \mathcal{L}(X, \lambda, \nu)& = \min_{X} \max_{\nu, \lambda \ge 0} \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) - \lambda \cdot \lambda_{\min} (X) \nonumber \\ &= \min_{X} \left\{ \begin{aligned} &\langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) && \text{if } \lambda_{\min} (X) \ge 0 \Rightarrow X \succeq 0 \\ & \infty && \text{otherwise} \end{aligned} \right. \nonumber \\ &=\min_{X \succeq 0} \max_{\nu} \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) \nonumber \\ &= \min_{X \succeq 0} \left\{\begin{aligned} &\langle C,X\rangle && \text{if } \langle A^\ell, X\rangle = b_\ell, ~\ell=1, \ldots, L \\ & -\infty && \text{otherwise} \end{aligned} \right. \nonumber \\ &= \min_{X \succeq 0} \langle C, X\rangle : \langle A^\ell, ~X\rangle = b_\ell, ~\ell=1, \ldots, \ell \nonumber \\ &= p^\ast \nonumber \end{align}
The fourth equality holds because an optimal solution may be \(v_i = 0\) if \(b_i - \langle A_i, X\rangle = 0\) and \(v_i = -t \cdot \text{sign}(b_i - \langle A_i, X\rangle)\) otherwise with \(t \to \infty\).
■
Lemma:
\begin{align*} d^\ast=\max_{\nu, \lambda \ge 0} \min_{X} \mathcal{L}(X, \lambda, \nu) \end{align*}
Proof:
For notational convenience, we let
\begin{align} Z &= C - \sum_{\ell=1}^{L} \nu_\ell A^\ell \nonumber \\ G(\lambda, Z) &= \min_{X} ( \langle Z, X \rangle - \lambda \cdot \lambda_{\min} (X) ) \nonumber \end{align}
where \(G(\lambda, \cdot)\) is the conjugate of the concave function \(\lambda \cdot \lambda_{\min} (X)\).
The conjugate \(f^\ast\) of a function \(f\) is given by \begin{align} f^\ast(y) = \sup_{x \in \textbf{dom} f} (y^\intercal x - f(x)) \nonumber \end{align}
and
\begin{align*} G(\lambda, \cdot) &= \max_{X} ( \langle \cdot, -X \rangle + \lambda \cdot \lambda_{\min} (X) ) \end{align*}
Suppose we can show that
\begin{align} G(\lambda, Z) = \left\{ \begin{aligned} &0 && \textrm{if } \textrm{Tr}(Z) = \lambda \ge 0, Z \succeq 0, \text{ i.e., } Z \in \mathcal{Z}(\lambda) \\ &-\infty && \textrm{otherwise} \end{aligned} \right. \nonumber \end{align}
where \(\mathcal{Z}(\lambda) = \{Z: \textrm{Tr}(Z) = \lambda \ge 0, Z \succeq 0\}\). Then,
\begin{align*} \max_{\nu, \lambda \ge 0} \min_{X} \mathcal{L}(X, \lambda, \nu) &= \max_{\nu, \lambda \ge 0} \nu^\intercal b + G(\lambda, Z) \\ &= \max_{\nu, \lambda \ge 0} \nu^\intercal b: Z = C - \sum_{i=1}^{m} \nu_i A_i \succeq 0, \textrm{Tr}(Z) = \lambda \ge 0 \\ &= \max_{\nu} \nu^\intercal b: C - \sum_{i=1}^{m} \nu_i A_i \succeq 0 \\ &= d^* \end{align*}
where \(\lambda\) can be eliminated because \(Z \succeq 0\) implies \(\textrm{Tr}(Z) \ge 0\).
Now, we will complete the proof by showing that
\begin{align} G(\lambda, Z) = \left\{ \begin{aligned} &0 && \textrm{if } \textrm{Tr}(Z) = \lambda \ge 0, Z \succeq 0, \text{ i.e., } Z \in \mathcal{Z}(\lambda) \\ &-\infty && \textrm{otherwise} \end{aligned} \right. \label{eqn:g_value} \end{align}
Because \(\lambda_{\min}(\cdot)\) can be represented as an optimization problem, we have
\begin{align} G(\lambda, Z) & = \min_{X} \left( \langle Z, X \rangle - \lambda \min_{Y \succeq 0, \textbf{Tr} Y = 1} \langle Y, X \rangle \right) = \min_{X} \max_{Y \succeq 0, \textbf{Tr} Y = 1} \langle Z - \lambda Y, X \rangle \nonumber \\ & = \min_{X} \max_{Y \succeq 0, \textbf{Tr} Y = \lambda} \langle Z - Y, X \rangle \nonumber \\ & \ge \max_{Y \succeq 0, \textbf{Tr} Y = \lambda} \min_{X \succeq 0} \langle Z - Y, X \rangle \nonumber \\ & = \max_{Y \succeq 0, \textbf{Tr} Y = \lambda} \left\{ \begin{aligned} &0 && \text{if } Z = Y, \\ &-\infty && \text{otherwise} \end{aligned} \right. \nonumber \\ & = \left\{ \begin{aligned} &0 && \text{if } \textrm{Tr}(Z) = \lambda \ge 0, Z \succeq 0, \\ &-\infty && \text{otherwise} \end{aligned} \right. \label{eqn:g} \end{align}
We prove \eqref{eqn:g_value} in two steps:
Proof of :math:`G(lambda, Z) = 0` if :math:`Z in mathcal{Z}(lambda)`
Following the definition of \(G(\lambda, Z)\), \(G(\lambda, Z) \le 0\) since \(X = 0\) is a feasible solution. We also have \(G(\lambda, Z) \ge 0\) because of \eqref{eqn:g}. So \(G(\lambda, Z) = 0\) if \(Z \in \mathcal{Z}(\lambda)\).
Proof of :math:`G(lambda, Z) = -infty` if :math:`Z notin mathcal{Z}(lambda)`
If \(\textrm{Tr}(Z) \neq \lambda\), choosing \(X = \epsilon t \texttt{I}\) with \(\epsilon = -\text{sign}(\textrm{Tr}(Z) - \lambda)\) and \(t \to +\infty\) implies
\begin{align*} G(\lambda, Z) &\le \langle Z, \epsilon t \texttt{I} \rangle - \lambda \cdot \lambda_{\min} (\epsilon t \texttt{I}) \\ & = -t |\textrm{Tr}(Z) + \lambda| \to \infty \end{align*}
If \(\textrm{Tr}(Z) = \lambda\), but \(Z \not\succeq 0\) (i.e., \(\lambda_{\max} (Z) < 0\)), let \(X = t u u^\intercal\), where \(u\) is an unit-norm (\(\lVert u^\intercal u \rVert = 1\)) eigenvector corresponding to \(\lambda_{\max} (Z)\) and \(t \to +\infty\). We have
\begin{align} G(\lambda, Z) &= \min_{X} ( \langle Z, X \rangle - \lambda \cdot \lambda_{\min} (X) ) \le \langle Z, t u u^\intercal \rangle - \lambda \cdot \lambda_{\min} (t u u^\intercal) \nonumber \\ &= t \textrm{Tr} (Z^\intercal uu^\intercal) = t \lambda_{\max}(Z) \to -\infty \nonumber \end{align}
because
\begin{align*} \textrm{Tr} (Z^\intercal uu^\intercal) &= \textrm{Tr} (uu^\intercal Z^\intercal ) = \textrm{Tr} (u (Zu)^\intercal) = \textrm{Tr} (u \lambda_{\max}(Z) u^\intercal) \\ & = \lambda_{\max}(Z) \textrm{Tr} (u u^\intercal) = \lambda_{\max}(Z) \textrm{Tr} (u^\intercal u) = \lambda_{\max}(Z) \end{align*}
and \(\lambda_{\min} (u u^\intercal) = 0\) because \(u u^\intercal\) is rank one matrix.
■
Lemma: \(\lambda_{\min}\) is concave, i.e., \(\lambda_{\min}\left(\frac{1}{2}A_1 + \frac{1}{2}A_2\right) \ge \frac{1}{2}\lambda_{\min}(A_1) + \frac{1}{2}\lambda_{\min}(A_2)\)
Proof: Let \(\mathcal{Y} = \{Y: Y \succeq 0, \textbf{Tr}Y = 1 \}\). We have \begin{align} \min_{Y \in \mathcal{Y}} \left\langle Y, \frac{1}{2}A_1 + \frac{1}{2}A_2 \right\rangle = \min_{Y \in \mathcal{Y}} \left( \frac{1}{2} \left\langle Y, A_1 \right\rangle + \frac{1}{2} \left\langle Y, A_2 \right\rangle \right) \nonumber \end{align}
Since, for any \(Y \in \mathcal{Y}\), \begin{align} && \frac{1}{2} \left\langle Y, A_1 \right\rangle + \frac{1}{2} \left\langle Y, A_2 \right\rangle &\ge \frac{1}{2} \min_{Y \in \mathcal{Y}} \left\langle Y, A_1 \right\rangle + \frac{1}{2} \min_{Y \in \mathcal{Y}} \left\langle Y, A_2 \right\rangle \nonumber \\ \Rightarrow && \min_{Y \in \mathcal{Y}} \left( \frac{1}{2} \left\langle Y, A_1 \right\rangle + \frac{1}{2} \left\langle Y, A_2 \right\rangle \right) & \ge \frac{1}{2} \min_{Y \in \mathcal{Y}} \left\langle Y, A_1 \right\rangle + \frac{1}{2} \min_{Y \in \mathcal{Y}} \left\langle Y, A_2 \right\rangle \nonumber \end{align}
\(\blacksquare\)
Conic Lagrangian¶
\begin{align} \mathcal{L}^c(X, \nu, Y) & = \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) - \langle Y,X\rangle \nonumber \\ & = \nu^\intercal b + \left\langle C - \sum_{\ell=1}^{L} \nu_\ell A^\ell, X \right\rangle - \langle Y,X\rangle \nonumber \end{align}
where now we associate a matrix dual variable \(Y \succeq 0\) to the constraint \(X \succeq 0\).
\begin{align*} p^\ast &= \min_{X \succeq 0} \max_{\nu, Y \succeq 0} \mathcal{L}^c(X, \nu, Y) \\ & \ge \max_{\nu, Y \succeq 0} \min_{X \succeq 0} \mathcal{L}^c(X, \nu, Y) \\ & = d^\ast \end{align*}
[5]:
hide_comment()
[5]:
Lemma
\begin{align*} p^\ast = \min_{X} \max_{\nu, Y \succeq 0} \mathcal{L}^c(X, \nu, Y) \end{align*}
Proof Recall the lemma that \(\lambda_{\min}(\cdot)\) is represented as an optimization problem.
We have \begin{align} \min_{Y \succeq 0} \langle Y, X\rangle = \min_{t \ge 0} \min_{Y \succeq 0, \textbf{Tr} Y = t} \langle Y, X\rangle = \min_{t \ge 0} t \lambda_{\min}(X) = \left\{ \begin{aligned} &0 && \text{if } X \succeq 0 \\ &-\infty && \text{otherwise.} \end{aligned} \right. \nonumber \end{align}
Therefore, we get \begin{align} \min_{X} \max_{\nu, Y \succeq 0} \mathcal{L}^c(X, \nu, Y) &= \min_{X} \max_{\nu, Y \succeq 0} \left( \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) - \langle Y,X\rangle \right) \nonumber \\ & = \min_{X \succeq 0} \max_{\nu} \left( \langle C,X\rangle + \sum_{\ell=1}^{L} \nu_\ell (b_\ell - \langle A^\ell, X\rangle) \right) \nonumber \\ &= \min_{X} \left\{\begin{aligned} &\langle C,X\rangle && \text{if } \langle A^\ell, X\rangle = b_\ell, ~\ell=1, \ldots, L \\ & -\infty && otherwise \end{aligned} \right. \nonumber \\ &= \min_{X} \langle C, X\rangle : \langle A^\ell, ~X\rangle = b_\ell, ~\ell=1, \ldots, \ell \nonumber \\ &= p^\ast \nonumber \end{align}
■
Lemma
\begin{align*} d^\ast = \max_{\nu, Y \succeq 0} \min_{X} \mathcal{L}^c(X, \nu, Y) \end{align*}
Proof
\begin{align} \min_{\nu, Y \succeq 0} \max_{X} \mathcal{L}^c(X, \nu, Y) &= \min_{\nu, Y \succeq 0} \max_{X} \left( \nu^\intercal b + \left\langle C - \sum_{\ell=1}^{L} \nu_\ell A^\ell, X \right\rangle - \langle Y,X\rangle \right) \nonumber \\ & = \min_{\nu} \nu^\intercal b + \min_{\nu, Y \succeq 0} \max_{X} \langle Z-Y, X \rangle \nonumber \\ & = \min_{\nu} \nu^\intercal b + \left\{ \begin{aligned} &0 && \text{if } Z \succeq 0, \\ &+\infty && \text{otherwise.} \end{aligned} \right. \nonumber \\ &= \min_{\lambda, \nu, Z} \nu^\intercal b: Z = C - \sum_{i=1}^{m} \nu_i A_i \succeq 0 \nonumber\\ &= d^\ast. \nonumber \end{align}
\(\blacksquare\)
Strong Duality¶
Strong duality can fail. Consider the example \begin{align} p^\ast = \min_{x} x_2 : \left( \begin{array}{ccc} x_2 + 1 & 0 & 0 \\ 0 & x_1 & x_2 \\ 0 & x_2 & 0 \end{array} \right) \succeq 0 \nonumber \end{align}
positive-semi definiteness implies \(x_1 \le 0\) and \(x^2_2 \le 0\). So, \(p^\ast = 0\).
The dual is \begin{align} d^\ast = \max_{Y \in S^3} - Y_{11}: Y \succeq 0, Y_{22}=0, 1-Y_{11}- 2Y_{23} = 0 \nonumber \end{align}
Since \(Y \succeq 0, Y_{22}=0\), \(Y_{23}= 0\). Hence, \(d^\ast = - Y_{11} = -1 < p^\ast\).
Strong duality holds if Slater’s condition holds for the primal problem where Slater’s condition states that the feasible region must have an interior point.
Conic Programming¶
Consider a conic programming problem
\begin{align} p^\ast = \min_{x \in \Re^n} \langle c, x \rangle : Ax = b, ~ x \in K, \nonumber \end{align}
where \(K \subseteq \Re^n\) is a convex cone.
The dual problem is
\begin{align} \max_{y \in \Re^m} \langle b, y \rangle : A^\intercal y + s = c, ~ s \in K^\ast, \nonumber \end{align}
where \(K^\ast\) is the dual cone defined by
\begin{align*} K^\ast = \{s \in \Re^n: \langle s,x\rangle \ge 0,~ \forall x \in K\}. \end{align*}
The Lagrange dual problem is
\begin{align} \inf_{x \in K} \mathcal{L}(y) &= \inf_{x \in K} \langle c, x \rangle + \langle y, b-Ax \rangle \nonumber \\ & = \langle y, b \rangle + \inf_{x \in K} \langle c, x \rangle - \langle y, Ax \rangle = \langle y, b \rangle + \inf_{x \in K} \langle c-A^\intercal y, x \rangle \nonumber \end{align}
If \(c-A^\intercal y \in K^\ast\), then \(\langle c-A^Ty, x \rangle\), so \(\inf_{x \in K} \langle c-A^\intercal y, x \rangle = 0\).
If \(c-A^\intercal y \notin K^\ast\), then \(\inf_{x \in K} \langle c-A^\intercal y, x \rangle = -\infty\).
Thus
\begin{align} \inf_{x \in K} \mathcal{L}(y) = \left\{ \begin{aligned} &\langle y, b \rangle && c-A^\intercal y \in K^\ast \\ &-\infty && c-A^\intercal y \notin K^\ast \end{aligned}\right. \nonumber \end{align}
Let \(s = c-A^Ty \in K^\ast\). Then we obtain the dual problem.
Applications¶
Eigenvalue Problem¶
For a matrix \(A \in S^n_+\) (the set of positive semi-definite matrices), we consider
\begin{align} p^\ast = \max_X \langle A, X\rangle: \textrm{Tr}(X) = 1, ~ X \succeq 0. \nonumber \end{align}
The associated Lagrangian, using the conic approach, is
\begin{align} \mathcal{L}(X, Y, \nu) = \langle A, X\rangle + \nu (1 - \textrm{Tr}(X)) + \langle Y, X\rangle \nonumber \end{align}
with the matrix dual variable \(Y \succeq 0\), and \(\nu \in \mathbb{R}\) is free. The dual problem is
\begin{align} d^\ast = \min_{\nu, Y} : Y + A = \lambda I, ~ Y \succeq 0. \nonumber \end{align}
Both the primal and dual problems are strictly feasible, so \(p^\ast = d^\ast\).
Robust Linear Programming¶
\begin{align*} \min \ c^\intercal x: \text{Prob}(a_i^\intercal x \le b_i) \ge \eta, \ i=1,\ldots, n \end{align*}
where coefficient vectors are IID \(a_i \sim \mathcal{N}(\bar{a}_i, \Sigma_i)\). For fixed \(x\), \(a^\intercal_i x\) is \(\mathcal{N}(\bar{a}^\intercal_i x, x^\intercal \Sigma_i x)\). We have
\begin{align*} \text{Prob}(\bar{a}^\intercal_i x + x^\intercal \Sigma_i x Z \le b) & \ge \eta \Leftrightarrow \text{Prob}\left(Z \le \frac{b - \bar{a}^\intercal_i x}{x^\intercal \Sigma_i x }\right) \ge \eta \\ \Phi^{-1}(\eta) & \le \frac{b - \bar{a}^\intercal_i x}{x^\intercal \Sigma_i x } \Leftrightarrow \bar{a}^\intercal_i x + \Phi^{-1}(\eta) \lVert \Sigma_i^{1/2} x \rVert_2 \le b \end{align*}
it is equivalent to
\begin{align*} \min \ & c^\intercal x \\ \text{s.t. } & \bar{a}^\intercal_i x + \Phi^{-1}(\eta) \lVert \Sigma_i^{1/2} x \rVert_2 \le b, \ i=1,\ldots, n \end{align*}
robust LP is an SOCP for \(\eta \ge 0.5\) because \(\Phi^{-1}(\eta) \ge 0\)
robust LP is a LP for \(\eta = 0.5\) because \(\Phi^{-1}(\eta) = 0\)
Distributionally Robust Optimization¶
\begin{align*} \max_{F} \ & E_{F}(h(x, \xi)) \\ \text{s.t.} \ & \int_{\mathcal{S}} d F(\xi) = 1 \\ & (\mathbb{E}(\xi) - \mu)^\intercal \Sigma^{-1} (\mathbb{E}(\xi) - \mu) \le \gamma_1 \\ & \mathbb{E}((\xi-\mu)(\xi-\mu)^\intercal) \preceq \gamma_2 \Sigma \end{align*}
is equivalent to an SDP
\begin{align*} \max_{F} \ & \int_{\mathcal{S}} h(x, \xi) d F(\xi) \\ \text{s.t.} \ & \int_{\mathcal{S}} d F(\xi) = 1 \\ & \int_{\mathcal{S}} \left[ \begin{array}{cc} \Sigma & \xi-\mu \\ \xi-\mu & \gamma_1 \end{array} \right] d F(\xi) \succeq 0 \quad \text{(Schur complement)}\\ & \int_{\mathcal{S}} (\xi-\mu)(\xi-\mu)^\intercal d F(\xi) \preceq \gamma_2 \Sigma \end{align*}
We refer to (Delage and Ye 2010) for its dual formulation.