pytorch
39f50f46 - matmul performance benchmarks (#51647)

Commit

3 years ago

matmul performance benchmarks (#51647) Summary: Minor PR following up the previous PR about sparse benchmarking utils https://github.com/pytorch/pytorch/pull/48397 Fixes https://github.com/pytorch/pytorch/issues/44634: Performance benchmarks for matrix-matrix and matrix-vector ops (dense-sparse, sparse-sparse, and compare to dense-dense) I ran all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for `DLMC/magnitude_pruning` dataset with different sparsity levels. --- <details><summary> forward tests (expand for details). </summary> - `sparse@sparse` ``` [------------------------------- cpu:matmul-forward -------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense | 108.1 | 100.5 | 101.3 | 108.4 | 98.4 | 187.4 torch:sparse@sparse | 659.1 | 368.8 | 156.5 | 53.3 | 26.8 | 14.9 scipy:sparse@sparse | 565.1 | 233.9 | 130.2 | 23.1 | 21.6 | 15.2 Times are in milliseconds (ms). [----------------------------------- cuda:matmul-forward -----------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ---------------------------------------------------------------------------------- torch:dense@dense | 2243.5 | 4392.5 | 4419.8 | 2272.3 | 4433.9 | 8920.1 torch:sparse@sparse | 21369.2 | 11877.6 | 7339.2 | 1787.2 | 1335.1 | 845.7 Times are in microseconds (us). ``` - `sparse@dense` ``` [------------------------------- cpu:matmul-forward -------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense | 105.8 | 103.8 | 103.0 | 104.4 | 104.4 | 197.0 torch:sparse@dense | 119.9 | 102.4 | 84.0 | 19.7 | 16.8 | 11.6 scipy:sparse@dense | 906.5 | 799.6 | 697.8 | 182.2 | 165.5 | 135.4 Times are in milliseconds (ms). [------------------------- cuda:matmul-forward --------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: --------------------------------------------------------------- torch:dense@dense | 2.2 | 4.4 | 4.4 | 2.3 | 4.5 | 2.3 torch:sparse@dense | 5.7 | 6.6 | 4.5 | 1.4 | 1.4 | 1.3 Times are in milliseconds (ms). ``` - `sparse@vector` ``` [----------------------------------- cpu:matmul-forward ----------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: -------------------------------------------------------------------------------- torch:dense@vector | 510.6 | 505.8 | 759.6 | 782.1 | 682.4 | 764.6 torch:sparse@vector | 10122.8 | 6241.1 | 7935.6 | 2076.3 | 1049.5 | 826.3 scipy:sparse@vector | 1756.7 | 1033.9 | 678.2 | 343.5 | 168.5 | 65.4 Times are in microseconds (us). [-------------------------------- cuda:matmul-forward --------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ---------------------------------------------------------------------------- torch:dense@vector | 36.1 | 21.5 | 21.6 | 21.5 | 21.6 | 21.5 torch:sparse@vector | 1099.2 | 1289.4 | 775.7 | 327.1 | 285.4 | 274.0 Times are in microseconds (us). ``` </details> --- <details><summary> backward tests (expand for details). </summary> - `sparse@sparse` ``` [--------------------------------- cpu:matmul-backward ---------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------------ torch:dense@dense | 246.1 | 315.0 | 306.9 | 168.6 | 290.6 | 146.9 torch:sparse@sparse | 6417.5 | 4393.7 | 3012.7 | 1029.4 | 908.0 | 650.7 Times are in microseconds (us). [----------------------------- cuda:matmul-backward -----------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ----------------------------------------------------------------------- torch:dense@dense | 6.7 | 13.3 | 13.3 | 6.9 | 13.5 | 6.9 torch:sparse@sparse | 143.7 | 143.4 | 119.6 | 29.5 | 29.1 | 10.9 Times are in microseconds (us). ``` - `sparse@dense` ``` [------------------------------ cpu:matmul-backward -------------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense | 185.9 | 304.8 | 305.8 | 169.9 | 308.7 | 168.4 torch:sparse@dense | 407.9 | 345.8 | 274.6 | 114.2 | 163.6 | 230.5 Times are in milliseconds (ms). [--------------------------- cuda:matmul-backward --------------------------] | 0.5 | 0.7 | 0.8 | 0.9 | 0.95 | 0.98 1 threads: ------------------------------------------------------------------ torch:dense@dense | 6.7 | 13.3 | 13.3 | 6.9 | 13.4 | 6.9 torch:sparse@dense | 16.7 | 19.0 | 15.1 | 6.3 | 8.2 | 12.7 Times are in milliseconds (ms). ``` </details> Kindly review this PR. cc mruberry, ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/51647 Reviewed By: albanD Differential Revision: D27007809 Pulled By: mruberry fbshipit-source-id: 8c1922cb3280027ca5e3eef31bfa20500c548cfd

Author

aocsa

Committer

facebook-github-bot

Parents

142c6b0e

pytorch 39f50f46 - matmul performance benchmarks (#51647)

pytorch
39f50f46 - matmul performance benchmarks (#51647)