Dropout Vectorized Kernel (#9157)
* vectorized kernel
* fix build
* re-calibrate expected loss
* fix build
* re-calibrate convergence results
* more re-calibrate on loss
* divide kernels
* adress comments
* more calibration
* calibration
* per comments
* enable sync
Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>