[LRD] Allowing using dedicated iteration counter for learning rate (#85195)
Summary: So that we could manipulate the iteration counter for lrarning rate separately (for learning rate decay or learning rate re-warming up etc), without affecting other techniques relying on iterations (such as EMA)
Test Plan:
Unit tests:
```
✓ Pass: caffe2/caffe2/python:optimizer_test - testSparse (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) (46.475)
✓ Pass: caffe2/caffe2/python:optimizer_test - test_global_norm_based_gradient_clipping (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) (46.475)
✓ Pass: caffe2/caffe2/python:optimizer_test - test_lr_injection (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration) (46.475)
✓ Pass: caffe2/caffe2/python:optimizer_test - main (46.475)
Summary
Pass: 5
Skip: 1
↻ caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestAdagradWithDedicatedLRIteration)
ListingSuccess: 1
```
Reviewed By: liangming168
Differential Revision: D38747417
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85195
Approved by: https://github.com/liangming168, https://github.com/eellison