CUDA implementation of Sparse Adagrad Fusion for GPUs (#35762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35762
We implement the following operators for Regular and RowWise SparseAdagrad Fusion with SLS and SLWS gradient
- SparseAdagradFusedWithSparseLengthsSumGradient
- RowWiseSparseAdagradFusedWithSparseLengthsSumGradient
- SparseAdagradFusedWithSparseLengthsWeightedSumGradient
- RowWiseSparseAdagradFusedWithSparseLengthsWeightedSumGradient
Test Plan:
- SparseAdagradFusedWithSparseLengthsSumGradient
- RowWiseSparseAdagradFusedWithSparseLengthsSumGradient
```
buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient \(caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps\)' --print-passing-details
```
- SparseAdagradFusedWithSparseLengthsWeightedSumGradient
- RowWiseSparseAdagradFusedWithSparseLengthsWeightedSumGradient
```
buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_weighted_sum_gradient \(caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps\)' --print-passing-details
```
Benchmark code:
```
buck run mode/dev-nosan //caffe2/caffe2/fb/optimizers:adagrad_fused_bench_gpu
```
Reviewed By: jspark1105
Differential Revision: D20453096
fbshipit-source-id: bc209348232e3454af0d1d909bbd8ab7f07f69fd