ADAM#
- pylit.optimizer.adam.adam(R, F, x0, method, maxiter=None, tol=None, svd=False, protocol=False)#
This is the ADAM optimization method. The interface is described in Optimizer.
Description#
Solves the optimization problem (1) using the ADAM (A Method for Stochastic Optimization) gradient method with non-negativity constraint. ADAM is a first-order stochastic optimization algorithm that adapts the learning rates for each variable individually by maintaining exponentially decaying averages of past gradients (first moment) and squared gradients (second moment). This stabilizes convergence and is well-suited for problems with noisy, sparse, or non-stationary gradients.
Algorithm#
Let \(\beta_1\) and \(\beta_2\) be the decay rates for the first and second moment estimates, and \(\epsilon\) a small constant for numerical stability. At iteration \(k\), the updates are:
\[\begin{split}m_k &= \beta_1 m_{k-1} + (1-\beta_1) \nabla f(x_{k-1}) \\ v_k &= \beta_2 v_{k-1} + (1-\beta_2) (\nabla f(x_{k-1}))^2 \\ \hat{m}_k &= \frac{m_k}{1 - \beta_1^k} \\ \hat{v}_k &= \frac{v_k}{1 - \beta_2^k} \\ x_k &= x_{k-1} - \eta \frac{\hat{m}_k}{\sqrt{\hat{v}_k} + \epsilon}\end{split}\]The algorithm terminates when the change in the objective function between successive iterates falls below the tolerance
tol
or when the maximum number of iterationsmaxiter
is reached.- rtype:
- returns:
A Solution object containing the final iterate.
References
Kingma and J. Ba. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR), 2015.