pytorch
3aecce70 - [pytorch] use cublas lt interface for bias fusion (#72148)

Commit

2 years ago

[pytorch] use cublas lt interface for bias fusion (#72148) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72148 To quantify how much cublas lt interface can help param bench (https://github.com/facebookresearch/param/) linear perf On V100 GPU for b in 512 1024; do for i in {1..5}; param_bench/train/compute/pt/pytorch_linear.py --device gpu --dtype=float16 --hidden-size 1024 --batch-size ${b}; done; done Before this commit batch size 512: median 21.4 TF/s (20.7, 20.6, 21.8, 21.6, 21.4) batch size 1024: median 40.1 TF/s (39.4, 39.3, 40.2, 40.4, 40.1) After this commit batch size 512: median 23.5 TF/s (23.2, 23.5, 23.8, 23.9, 23.6 ) 9.8% speedup batch size 1024: median 41.6 TF/s (42.7, 41.6, 40.4, 41.3, 41.9 ) 3.7% speedup Reviewed By: jasonjk-park, jianyuh Differential Revision: D33928147 fbshipit-source-id: cecc51a27f4b07a7f8cb728d48eebfc4e41ea823 (cherry picked from commit 2b71db6199c49b2461bc0d4c2647644b76b29d5d)

References

#73443 - merge master into lazy_tensor_staging

Author

jspark1105

Committer

pytorchmergebot

Parents

237574db

pytorch 3aecce70 - [pytorch] use cublas lt interface for bias fusion (#72148)

pytorch
3aecce70 - [pytorch] use cublas lt interface for bias fusion (#72148)