[optim][radam] group tensors in foreach to maximize perf (#92365)
Also noticed that eps is not being used nor tested at all for the mta impl of RAdam.
Will fix in a followup PR before turning foreach to default!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92365
Approved by: https://github.com/albanD