SemanticDiff

pytorch
ee5b59dd - [ROCm] CatArrayBatchedCopy performance improvement (#118685)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

222 days ago

[ROCm] CatArrayBatchedCopy performance improvement (#118685) Tune the grid and block sizes for ROCm. Add a contig kernel separate from aligned+contig. Verified new performance using pytorch/benchmarks/operator_benchmark. `python -m pt.cat_test --device=cuda --tag-filter all` On MI200 this improved performance on average 4%, and on MI300 14%. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118685 Approved by: https://github.com/malfet

Author

jeffdaily

jeffdaily

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading