|
31 | 31 | class SVRGFunction(ApproximateGradientSumFunction):
|
32 | 32 |
|
33 | 33 | r"""
|
34 |
| - The Stochastic Variance Reduced Gradient (SVRG) function calculates the approximate gradient of :math:`\sum_{i=1}^{n-1}f_i`. For this approximation, every `snapshot_update_interval` number of iterations, a full gradient calculation is made at this "snapshot" point. Intermediate gradient calculations update this snapshot by taking a index :math:`i_k` and calculating the gradient of :math:`f_{i_k}`s at the current iterate and the snapshot, updating the approximate gradient to be: |
| 34 | + The Stochastic Variance Reduced Gradient (SVRG) function calculates the approximate gradient of :math:`\sum_{i=1}^{n-1}f_i`. For this approximation, every `snapshot_update_interval` number of iterations, a full gradient calculation is made at this "snapshot" point. Intermediate gradient calculations update this snapshot by taking a index :math:`i_k` and calculating the gradient of :math:`f_{i_k}`'s at the current iterate and the snapshot, updating the approximate gradient to be: |
35 | 35 |
|
36 | 36 | .. math ::
|
37 | 37 | n*\nabla f_{i_k}(x_k) - n*\nabla f_{i_k}(\tilde{x}) + \nabla \sum_{i=0}^{n-1}f_i(\tilde{x}),
|
@@ -60,7 +60,7 @@ class SVRGFunction(ApproximateGradientSumFunction):
|
60 | 60 | snapshot_update_interval : positive int or None, optional
|
61 | 61 | The interval for updating the full gradient (taking a snapshot). The default is 2*len(functions) so a "snapshot" is taken every 2*len(functions) iterations. If the user passes `0` then no full gradient snapshots will be taken.
|
62 | 62 | store_gradients : bool, default: `False`
|
63 |
| - Flag indicating whether to store an update a list of gradients for each function :math:`f_i` or just to store the snapshot point :math:` \tilde{x}` and its gradient :math:`\nabla \sum_{i=0}^{n-1}f_i(\tilde{x})`. |
| 63 | + Flag indicating whether to store an update a list of gradients for each function :math:`f_i` or just to store the snapshot point :math:`\tilde{x}` and its gradient :math:`\nabla \sum_{i=0}^{n-1}f_i(\tilde{x})`. |
64 | 64 |
|
65 | 65 |
|
66 | 66 | """
|
@@ -212,28 +212,24 @@ def _update_full_gradient_and_return(self, x, out=None):
|
212 | 212 |
|
213 | 213 |
|
214 | 214 | class LSVRGFunction(SVRGFunction):
|
215 |
| - """"" |
| 215 | + r""" |
216 | 216 | A class representing a function for Loopless Stochastic Variance Reduced Gradient (SVRG) approximation. This is similar to SVRG, except the full gradient at a "snapshot" is calculated at random intervals rather than at fixed numbers of iterations.
|
217 |
| -
|
218 |
| -
|
219 |
| - Reference |
220 |
| - ---------- |
221 |
| -
|
222 |
| - Kovalev, D., Horváth, S. &; Richtárik, P.. (2020). Don’t Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop. Proceedings of the 31st International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 117:451-467 Available from https://proceedings.mlr.press/v117/kovalev20a.html. |
223 |
| -
|
224 |
| -
|
225 |
| -
|
| 217 | + |
226 | 218 | Parameters
|
227 | 219 | ----------
|
228 |
| - functions : `list` of functions |
| 220 | + functions : `list` of functions |
229 | 221 | A list of functions: :code:`[f_{0}, f_{1}, ..., f_{n-1}]`. Each function is assumed to be smooth with an implemented :func:`~Function.gradient` method. All functions must have the same domain. The number of functions `n` must be strictly greater than 1.
|
230 | 222 | sampler: An instance of a CIL Sampler class ( :meth:`~optimisation.utilities.sampler`) or of another class which has a `next` function implemented to output integers in {0,...,n-1}.
|
231 | 223 | This sampler is called each time gradient is called and sets the internal `function_num` passed to the `approximate_gradient` function. Default is `Sampler.random_with_replacement(len(functions))`.
|
232 | 224 | snapshot_update_probability: positive float, default: 1/n
|
233 | 225 | The probability of updating the full gradient (taking a snapshot) at each iteration. The default is :math:`1./n` so, in expectation, a snapshot will be taken every :math:`n` iterations.
|
234 | 226 | store_gradients : bool, default: `False`
|
235 |
| - Flag indicating whether to store an update a list of gradients for each function :math:`f_i` or just to store the snapshot point :math:` \tilde{x}` and it's gradient :math:`\nabla \sum_{i=0}^{n-1}f_i(\tilde{x})`. |
| 227 | + Flag indicating whether to store an update a list of gradients for each function :math:`f_i` or just to store the snapshot point :math:`\tilde{x}` and its gradient :math:`\nabla \sum_{i=0}^{n-1}f_i(\tilde{x})`. |
236 | 228 |
|
| 229 | + |
| 230 | + Reference |
| 231 | + --------- |
| 232 | + Kovalev, D., Horváth, S. &; Richtárik, P.. (2020). Don’t Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop. Proceedings of the 31st International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 117:451-467 Available from https://proceedings.mlr.press/v117/kovalev20a.html. |
237 | 233 |
|
238 | 234 | Note
|
239 | 235 | ----
|
|
0 commit comments