specialized CUDA impl for dropout in AD (#17756)
Summary:
In aten we have a _fused_dropout implementation for CUDA case. As ngimel suggested if we discard it in JIT AD, it hurts performance.
It doesn't seem ideal to include backend specific implementation in AD, but this is helpful to prevent performance regression atm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17756
Differential Revision: D14368999
Pulled By: ailzhang
fbshipit-source-id: 9a371c5020f630e8f6e496849ec9772b6f196169