[caffe2] Add the dedup implementation of fused RowWiseAdagrad op on GPUs (#40282)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40282
Test Plan:
```
buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient \(caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps\)' --print-passing-details
```
https://our.intern.facebook.com/intern/testinfra/testrun/4785074632584150
Reviewed By: jspark1105
Differential Revision: D22102737
fbshipit-source-id: fa3fef7cecb1e2cf5c9b6019579dc0f86fd3a3b2