SemanticDiff

pytorch
948cc542 - Vectorize cpu tensor conversions (#80905)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

2 years ago

Vectorize cpu tensor conversions (#80905) This adds vectorization to the copy kernel acting between different dtypes through the use of `at::vec::convert`. Currently `vec::convert` falls back to a scalar copy loop for most dtypes, however the compiler is still better able to auto-vectorize the loop since it doesn't involve stride calculations. In a simple timeit benchmark I see around a 2x speedup copying from int32 to various dtypes: | To dtype | Master (us) | This PR (us) | |----------|-------------|--------------| | int64 | 23.8 | 10.3 | | float32 | 16.8 | 8.18 | | float64 | 18.0 | 9.47 | Pull Request resolved: https://github.com/pytorch/pytorch/pull/80905 Approved by: https://github.com/ngimel

Author

peterbell10

peterbell10

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading