Migrate multi_margin_loss to ATen (CUDA) (#61426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61426
Closes gh-24600, closes gh-24601
These operators use custom kernels that aren't well suited to `TensorIterator` style, so this is just changing the CPU code and cleaning up the style.
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision: D29648015
Pulled By: ngimel
fbshipit-source-id: cadf1890cdc2199d57f4533370e554613efeb54a