Make addcmul and addcdiv support different dtypes
Fixes #70486.
Also includes a workaround for Jiterator getting stuck on CUDA 11.3:
https://github.com/pytorch/pytorch/pull/74234#issuecomment-1100932209
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74234
Approved by: https://github.com/ezyang