ADAM#

pylit.optimizer.adam.adam(R, F, x0, method, maxiter=None, tol=None, svd=False, protocol=False)#

This is the ADAM optimization method. The interface is described in Optimizer.

Description#

Solves the optimization problem (1) using the ADAM (A Method for Stochastic Optimization) gradient method with non-negativity constraint. ADAM is a first-order stochastic optimization algorithm that adapts the learning rates for each variable individually by maintaining exponentially decaying averages of past gradients (first moment) and squared gradients (second moment). This stabilizes convergence and is well-suited for problems with noisy, sparse, or non-stationary gradients.

Algorithm#

Let \(\beta_1\) and \(\beta_2\) be the decay rates for the first and second moment estimates, and \(\epsilon\) a small constant for numerical stability. At iteration \(k\), the updates are:

\[\begin{split}m_k &= \beta_1 m_{k-1} + (1-\beta_1) \nabla f(x_{k-1}) \\ v_k &= \beta_2 v_{k-1} + (1-\beta_2) (\nabla f(x_{k-1}))^2 \\ \hat{m}_k &= \frac{m_k}{1 - \beta_1^k} \\ \hat{v}_k &= \frac{v_k}{1 - \beta_2^k} \\ x_k &= x_{k-1} - \eta \frac{\hat{m}_k}{\sqrt{\hat{v}_k} + \epsilon}\end{split}\]

The algorithm terminates when the change in the objective function between successive iterates falls below the tolerance tol or when the maximum number of iterations maxiter is reached.

rtype:: Solution
returns:: A Solution object containing the final iterate.

References

1. 1. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR), 2015.