Nesterov#

pylit.optimizer.nesterov.nesterov(R, F, x0, method, maxiter=None, tol=None, svd=False, protocol=False)#

This is the Nesterov optimization method. The interface is described in Optimizer.

Description#

Solves the optimization problem (1) using the Nesterov accelerated gradient method with a non-negativity constraint. The algorithm maintains two sequences of iterates, a main sequence \(x_k\) and an auxiliary lookahead sequence \(y_k\). At each iteration, the lookahead point is computed by combining the current and previous iterates with a momentum parameter. A projected gradient descent step is then performed from this lookahead point. Projection enforces feasibility with respect to the non-negativity constraint. The lookahead mechanism reduces oscillations and accelerates convergence, achieving the optimal \(\mathcal{O}(1/k^2)\) rate for first-order methods.

Algorithm#

Given a momentum parameter \(0 < \theta_k < 1\) and learning rate \(\eta > 0\), the updates are:

\[\begin{split}\begin{align*} \boldsymbol{\beta}_0 &= \boldsymbol{\alpha}_0 \in \mathbb{R}^n, & & \text{(initial guess)} \\ \boldsymbol{\beta}_k &= \max(0, \boldsymbol{\alpha}_k + \theta_k (\boldsymbol{\alpha}_k - \boldsymbol{\alpha}_{k-1})), & & \text{(non-negative momentum step)} \\ \boldsymbol{\alpha}_{k+1} &= \boldsymbol{\beta}_k - \eta \nabla f(\boldsymbol{\beta}_k), & & \text{(lookahead gradient step)} \end{align*}\end{split}\]

The algorithm terminates when the change in the objective function between successive iterates falls below the tolerance tol or when the maximum number of iterations maxiter is reached.

rtype:: Solution
returns:: A Solution object containing the final iterate.

References

1. Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1), 125–161, 2012.