pytorch
c91a41fd - [Inductor][Quant]Enable the decomposed dequant maxpooling2d loop fusion (#99132)

Commit

1 year ago

[Inductor][Quant]Enable the decomposed dequant maxpooling2d loop fusion (#99132) **Summary** Lowering of [`max_pool2d` ](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/lowering.py#L2732) will check the `num_reads` of input `StorageBox.data`. When num of reads is larger than 1, input of `StorageBox` will invoke `realize` and break the loop fusion with previous node. The previous node could be `decomposed.dequant_per_tensor.tensor` in quantization use case. For `decomposed.dequant_per_tensor.tensor`, it has 3 num of reads. But 2 of these 3 num of reads are scalar tensors as `zero point` and `scale`. In this PR, we try to relax the criterion for `StorageBox.realize`. Specifically, when the input is an instance of `Pointwise`, we will also check the number of non scalar tensor's read, and only invoke `StorageBox.realize` when the number of non scalar tensor's read is also larger than 1. It helps enable the loop fusion and vec code gen of pattern `decomposed.dequant_per_tensor.tensor - max_pool2d`. **Test Plan** ``` cd test/inductor && python -m pytest test_cpu_repro.py -k test_dequant_maxpool2d_lowering ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99132 Approved by: https://github.com/jgong5, https://github.com/jansel

Author

leslie-fang-intel

Committer

pytorchmergebot

Parents

675029aa

pytorch c91a41fd - [Inductor][Quant]Enable the decomposed dequant maxpooling2d loop fusion (#99132)

pytorch
c91a41fd - [Inductor][Quant]Enable the decomposed dequant maxpooling2d loop fusion (#99132)