Optimize OneHot CUDA Kernel (#4012)
* Optimize for OneHot with zero off value.
* Add test cases for indices out of range.
Co-authored-by: Vincent Wang <weicwang@microsoft.com>
Co-authored-by: Vincent Wang <weicwang@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>