Sparse softmax support (CUDA) (#42307)

Commit

3 years ago

Sparse softmax support (CUDA) (#42307) Summary: This PR implements softmax support for sparse tensors. Resolves gh-23651 for CUDA. - [x] sparse softmax - [x] CUDA C++ implementation - [x] unittests - [x] update softmax documentation - [x] autograd support - [x] sparse log_softmax - [x] CUDA C++ implementation - [x] unittests - [x] update log_softmax documentation - [x] autograd support Here are some benchmark (script is [here](https://gist.github.com/aocsa/fbc1827b3e49901512a33ba96092cbc1)) results for `torch.sparse.softmax and torch.softmax`, using CPU and GPU, values are float64 scalars, timing repeat is 1000: | size | density | sparse CUDA | sparse CPU | |--------------|---------|-------------|------------| | (32, 10000) | 0.01 | 380.2 | 687.5 | | (32, 10000) | 0.05 | 404.3 | 2357.9 | | (32, 10000) | 0.1 | 405.9 | 3677.2 | | (512, 10000) | 0.01 | 438.0 | 5443.4 | | (512, 10000) | 0.05 | 888.1 | 24485.0 | | (512, 10000) | 0.1 | 1921.3 | 45340.5 | | size | density | dense CUDA | dense CPU | |--------------|---------|-------------|------------| | (32, 10000) | 0.01 | 23.6 | 1943.2 | | (32, 10000) | 0.05 | 23.6 | 1954.0 | | (32, 10000) | 0.1 | 23.5 | 1950.0 | | (512, 10000) | 0.01 | 639.3 | 39797.9 | | (512, 10000) | 0.05 | 640.3 | 39374.4 | | (512, 10000) | 0.1 | 639.6 | 39192.3 | Times are in microseconds (us). Quick note: I updated the performance test again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42307 Reviewed By: ngimel Differential Revision: D23774427 Pulled By: mruberry fbshipit-source-id: bfabf726075b39dde544c10249f27ae1871f82c7

Author

aocsa

Committer

facebook-github-bot

Parents

b3d7c2f9

pytorch 29dc3c5e - Sparse softmax support (CUDA) (#42307)

Commit

pytorch
29dc3c5e - Sparse softmax support (CUDA) (#42307)