Fix torch.normal with CUDA tensors (#22533)
Summary:
`addcmul_out` overwrote the samples, which led to constant values being output by `torch.normal`.
Changelog:
- Replace the `addcmul_out` calls with combo of inplace `mul` and `add` and justification for this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22533
Test Plan:
- Enable tests for test_normal on all devices
Fixes https://github.com/pytorch/pytorch/issues/22529
Differential Revision: D16141337
Pulled By: ezyang
fbshipit-source-id: 567a399042e0adcd154582f362318ce95a244c62