pytorch
5f979d40 - [inductor] let coordinate descent tuning respect max block size (#103660)

Commit View On GitHub

Commit

1 year ago

[inductor] let coordinate descent tuning respect max block size (#103660) It turns out that we need fix https://github.com/pytorch/pytorch/issues/103656 in coordinate descent tuner. Inductor generate triton code with assumption of max-block-size. If inductor is sure that numel is a multiple of the max-block-size, inductor will safely skip the check of the corresponding mask for perf reason. Coordinate descent tuner previous does not respect this assumption and may pick triton config with even larger block size. That will cause IMA. BTW, I was wondering how we pick those max block size. Not enforcing a max block size may allow coordinate descent tuner find an even better config. But it may slow down other cases a bit because of extra mask check. Test: ``` TORCHINDUCTOR_COORDINATE_DESCENT_TUNING=1 TORCHINDUCTOR_MAX_AUTOTUNE_POINTWISE=1 TORCHINDUCTOR_BENCHMARK_KERNEL=1 python benchmarks/dynamo/torchbench.py --amp --performance --inference --inductor --only alexnet ``` Fail before and works after. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103660 Approved by: https://github.com/spectrometerHBH, https://github.com/jansel

Author

shunting314

Committer

pytorchmergebot

Parents

155691a7

pytorch 5f979d40 - [inductor] let coordinate descent tuning respect max block size (#103660)

Commit

pytorch
5f979d40 - [inductor] let coordinate descent tuning respect max block size (#103660)