To fix the chainability at epoch zero for some schedulers (#63457)
Summary:
It has been discussed in the https://github.com/pytorch/pytorch/pull/60836#issuecomment-899084092 that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed
* some of the learning rate schedulers returns initial learning rates at epoch 0 as
```
return self.base_lrs`
```
* This can be a problem when two schedulers called as chained as
```
scheduler1.step()
scheduler2.step()
```
in particular, we completely ignore the effect of scheduler1 at epoch 0. This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors.
The following code snippet illustrates the problem better
## Reproducing the bug
```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 1.0)
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
for epoch in range(10):
print(epoch, scheduler2.get_last_lr()[0])
optimizer.step()
scheduler1.step()
scheduler2.step()
```
### Current Result
```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 5.904900000000001
6 5.314410000000001
7 4.782969000000001
8 4.304672100000001
9 3.874204890000001
```
### Expected Result
```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 0.5904900000000001
6 0.5314410000000001
7 0.4782969000000001
8 0.4304672100000001
9 0.3874204890000001
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63457
Reviewed By: datumbox
Differential Revision: D30424160
Pulled By: iramazanli
fbshipit-source-id: 3e15af8d278c872cd6f53406b55f4d3ce5002867