SemanticDiff

pytorch
a77226aa - [inductor] improve kernel metadata logging (#120274)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

212 days ago

[inductor] improve kernel metadata logging (#120274) Log a few more fields - num_atomic_add: perf of kernels using atomic_add are usually data dependent. Our benchmarking code generate all indices to be 0 which will result in worse perf than reality. - kernel_args_num_gb: estimate the amount of read/writes for kernel args. In-place args will be double counted. If we have a good estimation, this should be the lower bound of memory access that the GPU performs. Sometimes GPU will do more memory access since a single buffer may be access multiple times (e.g. for softmax when input tensor is quite large. cache only help a bit here). With this logged, and if we augment the metadata with amount of memory the GPU actually accessed, then it would be nice to dig into kernels that GPU access more memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120274 Approved by: https://github.com/jansel ghstack dependencies: #120266

Author

shunting314

shunting314

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading