matmul performance benchmarks (#51647)
Summary:
Minor PR following up the previous PR about sparse benchmarking utils https://github.com/pytorch/pytorch/pull/48397
Fixes https://github.com/pytorch/pytorch/issues/44634: Performance benchmarks for matrix-matrix and matrix-vector ops (dense-sparse, sparse-sparse, and compare to dense-dense)
I ran all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for `DLMC/magnitude_pruning` dataset with different sparsity levels.
---
<details><summary> forward tests (expand for details).
</summary>
- `sparse@sparse`
```
[------------------------------- cpu:matmul-forward -------------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: -------------------------------------------------------------------------
torch:dense@dense | 108.1 | 100.5 | 101.3 | 108.4 | 98.4 | 187.4
torch:sparse@sparse | 659.1 | 368.8 | 156.5 | 53.3 | 26.8 | 14.9
scipy:sparse@sparse | 565.1 | 233.9 | 130.2 | 23.1 | 21.6 | 15.2
Times are in milliseconds (ms).
[----------------------------------- cuda:matmul-forward -----------------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: ----------------------------------------------------------------------------------
torch:dense@dense | 2243.5 | 4392.5 | 4419.8 | 2272.3 | 4433.9 | 8920.1
torch:sparse@sparse | 21369.2 | 11877.6 | 7339.2 | 1787.2 | 1335.1 | 845.7
Times are in microseconds (us).
```
- `sparse@dense`
```
[------------------------------- cpu:matmul-forward -------------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: -------------------------------------------------------------------------
torch:dense@dense | 105.8 | 103.8 | 103.0 | 104.4 | 104.4 | 197.0
torch:sparse@dense | 119.9 | 102.4 | 84.0 | 19.7 | 16.8 | 11.6
scipy:sparse@dense | 906.5 | 799.6 | 697.8 | 182.2 | 165.5 | 135.4
Times are in milliseconds (ms).
[------------------------- cuda:matmul-forward --------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: ---------------------------------------------------------------
torch:dense@dense | 2.2 | 4.4 | 4.4 | 2.3 | 4.5 | 2.3
torch:sparse@dense | 5.7 | 6.6 | 4.5 | 1.4 | 1.4 | 1.3
Times are in milliseconds (ms).
```
- `sparse@vector`
```
[----------------------------------- cpu:matmul-forward ----------------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: --------------------------------------------------------------------------------
torch:dense@vector | 510.6 | 505.8 | 759.6 | 782.1 | 682.4 | 764.6
torch:sparse@vector | 10122.8 | 6241.1 | 7935.6 | 2076.3 | 1049.5 | 826.3
scipy:sparse@vector | 1756.7 | 1033.9 | 678.2 | 343.5 | 168.5 | 65.4
Times are in microseconds (us).
[-------------------------------- cuda:matmul-forward --------------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: ----------------------------------------------------------------------------
torch:dense@vector | 36.1 | 21.5 | 21.6 | 21.5 | 21.6 | 21.5
torch:sparse@vector | 1099.2 | 1289.4 | 775.7 | 327.1 | 285.4 | 274.0
Times are in microseconds (us).
```
</details>
---
<details><summary> backward tests (expand for details).
</summary>
- `sparse@sparse`
```
[--------------------------------- cpu:matmul-backward ---------------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: ------------------------------------------------------------------------------
torch:dense@dense | 246.1 | 315.0 | 306.9 | 168.6 | 290.6 | 146.9
torch:sparse@sparse | 6417.5 | 4393.7 | 3012.7 | 1029.4 | 908.0 | 650.7
Times are in microseconds (us).
[----------------------------- cuda:matmul-backward -----------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: -----------------------------------------------------------------------
torch:dense@dense | 6.7 | 13.3 | 13.3 | 6.9 | 13.5 | 6.9
torch:sparse@sparse | 143.7 | 143.4 | 119.6 | 29.5 | 29.1 | 10.9
Times are in microseconds (us).
```
- `sparse@dense`
```
[------------------------------ cpu:matmul-backward -------------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: -------------------------------------------------------------------------
torch:dense@dense | 185.9 | 304.8 | 305.8 | 169.9 | 308.7 | 168.4
torch:sparse@dense | 407.9 | 345.8 | 274.6 | 114.2 | 163.6 | 230.5
Times are in milliseconds (ms).
[--------------------------- cuda:matmul-backward --------------------------]
| 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98
1 threads: ------------------------------------------------------------------
torch:dense@dense | 6.7 | 13.3 | 13.3 | 6.9 | 13.4 | 6.9
torch:sparse@dense | 16.7 | 19.0 | 15.1 | 6.3 | 8.2 | 12.7
Times are in milliseconds (ms).
```
</details>
Kindly review this PR. cc mruberry, ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51647
Reviewed By: albanD
Differential Revision: D27007809
Pulled By: mruberry
fbshipit-source-id: 8c1922cb3280027ca5e3eef31bfa20500c548cfd