[c2][cuda] small improvement to dedup adagrad by avoiding recompute of x_ij (#44173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44173
it has small 10~15% speed improvement
Test Plan:
== Correctness ==
`buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient '`
Reviewed By: jianyuh
Differential Revision: D23494030
fbshipit-source-id: cdb7ee716a7e559903b72ed9f93bf106813f88fa