SGD: remove unneeded multiply-add initialization operations (#18114)
Summary:
The momentum buffer is initialized to the value of
d_p, but the current code takes the long way to do this:
1. Create a buffer of zeros
2. Multiply the buffer by the momentum coefficient
3. Add d_p to the buffer
All of these can be collapsed into a single step:
1. Create a clone of d_p
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18114
Differential Revision: D14509122
Pulled By: ezyang
fbshipit-source-id: 4a79b896201d5ff20770b7ae790c244ba744edb8