Fixed off-by-one bug in Adam Smart Decay (#62135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62135
The initial implementation of Adam with Smart Decay had an off-by-one error. This was in the summation of the geometric series used to calculate how much built-up momentum would have been discharged in skipped minibatches.
The unit tests should have caught these, but the testing strategy missed this because k, the "number of skipped minibatches" was always either 0 or so high that the impact of the bug was too small. The impact of the bug was proportional to 1/k. The testing strategy has also been adjusted to cover this bug.
Differential Revision: D29889309
fbshipit-source-id: b086c0efed5c27f621061e726533c73658daffc6