Optimization Algorithms#

Navigating the landscape of optimization algorithms can be overwhelming at times, especially when each algorithm comes with its own set of assumptions and applicability scope. Pyxu aims to simplify this by offering a comprehensive suite of algorithms tailored for solving Bayesian estimation problems. In this section, we’ll walk through the different algorithms available, discuss how to choose the most suitable ones for your problem, and show you how to flexibly configure them. πŸŽ›οΈ

Algorithms Overview#

Conjugate Gradient (CG) πŸ”—#

πŸ‘‰ Use case: Quadratic problems, particularly when you have a well-conditioned matrix \(A\).

Mathematical Form:

\[\min_{x\in\mathbb{R}^{N}} \frac{1}{2} \mathbf{x}^{T} \mathbf{A} \mathbf{x} - \mathbf{x}^{T} \mathbf{b}\]

(Accelerated) Proximal Gradient Descent (PGD) πŸ”—#

πŸ‘‰ Use case: Problems that separate into smooth and non-smooth components.

Mathematical Form:

\[\min_{\mathbf{x}\in\mathbb{R}^N} \;\mathcal{F}(\mathbf{x})\;\;+\;\;\mathcal{G}(\mathbf{x}),\]

where \(\mathcal{F}\) and \(\mathcal{G}\) are differentiable and proximable functionals, respectively.

Note: With acceleration, PGD can be shown to be an optimal first-order method, with the fastest possible convergence rate!

Primal-Dual Splitting (PDS) Methods#

πŸ‘‰ Use case: Multi-term problems, with smooth and non-smooth terms, possibly composed with a linear operator \(K\).

Mathematical Form:

\[\min_{\mathbf{x}\in\mathbb{R}^N} \;\mathcal{F}(\mathbf{x})\;\;+\;\;\mathcal{G}(\mathbf{x})\;\;+\;\;\mathcal{H}(K\mathbf{x}),\]

where \(\mathcal{F}\) is differentiable, \(\mathcal{G}\) and \(\mathcal{H}\) are proximable, and \(K\) is a linear operator.

Methods under this category include CondatVu πŸ”—, PD3O πŸ”—, ADMM πŸ”—, ChambollePock πŸ”—, and more.

Note that, although implemented for three-term objective functionals, PDS methods can easily be generalized to objective functionals of the form:

\[{\min_{\mathbf{x}\in\mathbb{R}^N} \;\mathcal{F}(\mathbf{x})\;\;+\;\;\mathcal{G}(\mathbf{x})\;\;+\;\;\sum_{i=1}^J\mathcal{H}_i(K_i\mathbf{x})}\]

by means of stacking operators:

K = vstack([K_1, ..., K_J])
H = hstack([h_1, ..., h_J])

Choosing the Right Algorithm#

The golden rule is to choose the most specific algorithm β€”i.e., the one that makes the most assumptions consistent with your problem. This often results in faster convergence. For example, if your objective functional has a gradient, a gradient-based method like PGD will generally be more efficient than a generic proximal-based method like DouglasRachford πŸ”—.

The most generic algorithms in Pyxu are PD3O and CondatVu, but they are also the least efficient, so use them only when simpler methods like ADMM, PGD or CG cannot be used. Note that Adam πŸ”— can also be useful when step sizes are too complex to compute.

Hyperparameter Tuning#

Pyxu comes with pre-implemented automatic tuning strategies for various algorithms. For instance, the primal-dual splitting methods offer three strategies:

  1. tuning_strategy == 1: safe step sizes and no relaxation.This is the most standard way of setting the parameters in the literature, does not leverage relaxation.

  2. tuning_strategy == 2: large step sizes and no relaxation. This strategy favours large step sizes forbidding the use of overrelaxation.

  3. tuning_strategy == 3: safe step sizes and large overrelaxation. This strategy chooses smaller step sizes, but performs overrelaxation (momentum acceleration).

Example Usage#

Here’s how you can solve a problem involving multiple terms:

\[\min_{\mathbf{x}\in\mathbb{R}^N}\frac{1}{2}\left\|\mathbf{y}-\mathbf{A}\mathbf{x}\right\|_2^2\quad+\quad\lambda_1 \|\mathbf{D}\mathbf{x}\|_1\quad+\quad\lambda_2 \|\mathbf{x}\|_1.\]

This problem can be written in the form

\[{\min_{\mathbf{x}\in\mathbb{R}^N} \;\mathcal{F}(\mathbf{x})\;\;+\;\;\mathcal{G}(\mathbf{x})\;\;+\;\;\mathcal{H}(\mathbf{K} \mathbf{x})}\]

by choosing \(\mathcal{F}(\mathbf{x})= \frac{1}{2}\left\|\mathbf{y}-\mathbf{A}\mathbf{x}\right\|_2^2\), \(\mathcal{G}(\mathbf{x})=\lambda_2\|\mathbf{x}\|_1\), \(\mathcal{H}(\mathbf{x})=\lambda_1 \|\mathbf{x}\|_1\) and \(K=\mathbf{D}\).

Solving this problem with Pyxu amounts to the following steps:

# Define operators A and D,
...

# Define functionals
l22_loss = (1 / 2) * SquaredL2Norm(dim=A.shape[0]).asloss(y) * A  # Differentiable term F
l1_norm = 0.1 * L1Norm(dim=l22_loss.dim)  # Proximable term G
l1_tv = 0.01 * L1Norm(dim=D.shape[0])  # Proximable term H

# Initialize solver (Using Condat-Vu as an example)
solver = CondatVu(f=l22_loss, g=l1_norm, h=l1_tv, K=D, show_progress=False, verbosity=100)

# Fit the model
solver.fit(x0=x0, tuning_strategy=2)
sol = solver.solution().squeeze()

Advanced Usage: Guru Interface#

For those who want even more control, we provide a guru interface allowing you to overload default settings, including the stopping criteria (see module pyxu.opt.stopπŸ”— for available stopping criteria).

For example, overloading the default stopping criterion of the CondatVu solver initialized above can be achieved as follows:

# Custom stopping criterion (optional)
custom_stop_crit = (RelError(eps=1e-3, var="x", f=None, norm=2, satisfy_all=True) &
                    RelError(eps=1e-3, var="z", f=None, norm=2, satisfy_all=True) &
                    MaxIter(20)) | MaxIter(1000)

# Fit the model  with the new stopping criterion
solver.fit(x0=x0, tuning_strategy=2, stop_crit=custom_stop_crit)

Implementing New Algorithms#

To implement a new iterative solver, users need to sub-class pyxu.abc.SolverπŸ”— and overwrite some of its core methods, such as m_init()πŸ”— and m_step()πŸ”—, which describe the initalization and update steps of iterative algorithms (see the API Reference for more details).

Sub-classes of Solver inherit automatically its very versatile API for solving optimisation problems, with the following notable features:

  • manual/automatic/background execution of solver iterations via parameters provided to fit()πŸ”—.

  • automatic checkpointing of solver progress, providing a safe restore point in case of faulty numerical code. Each solver instance backs its state and final output to a folder on disk for post-analysis. In particular fit() will never crash: detailed exception information will always be available in a logfile for post-analysis.

  • solve for multiple initial points in parallel.

Now that you’re equipped with the algorithmic know-how, go ahead and choose the best algorithm for your Bayesian estimation problem. Happy optimizing! πŸš€