[inductor][compilation time] Fallback when kernel size for avg/max pool is large (#89448)
This fixes compilation time for yolov3 from 400 seconds to 48 seconds. yolov3 has a 13x13 max_pool2d kernel, which was creating really large Triton code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89448
Approved by: https://github.com/ngimel