Introduce fast path for cuda_equal (#102714)
We introduce the same trick for cuda_equal. Assuming in cuda_equal, the flags are already handled correctly.
Added the tests for cuda part.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102714
Approved by: https://github.com/ezyang