[optim][radam] fix eps discrepancy for foreach (#92551)
Will likely race with https://github.com/pytorch/pytorch/pull/92365
eps was not being used at all in the mta/foreach impl. There was also a discrepancy between the docs vs the implementation: the implementation was doing sqrt(x) + eps and the docs were doing sqrt(x+eps)).
I've fixed the docs + extended the current multi_tensor test case to capture this issue.
![image](https://user-images.githubusercontent.com/31798555/213300617-61cbb763-da2d-48e0-b3b6-0190594dd049.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92551
Approved by: https://github.com/albanD