[Inductor][Quant]Enable the decomposed dequant maxpooling2d loop fusion (#99132)
**Summary**
Lowering of [`max_pool2d` ](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/lowering.py#L2732) will check the `num_reads` of input `StorageBox.data`. When num of reads is larger than 1, input of `StorageBox` will invoke `realize` and break the loop fusion with previous node. The previous node could be `decomposed.dequant_per_tensor.tensor` in quantization use case. For `decomposed.dequant_per_tensor.tensor`, it has 3 num of reads. But 2 of these 3 num of reads are scalar tensors as `zero point` and `scale`. In this PR, we try to relax the criterion for `StorageBox.realize`. Specifically, when the input is an instance of `Pointwise`, we will also check the number of non scalar tensor's read, and only invoke `StorageBox.realize` when the number of non scalar tensor's read is also larger than 1. It helps enable the loop fusion and vec code gen of pattern `decomposed.dequant_per_tensor.tensor - max_pool2d`.
**Test Plan**
```
cd test/inductor && python -m pytest test_cpu_repro.py -k test_dequant_maxpool2d_lowering
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99132
Approved by: https://github.com/jgong5, https://github.com/jansel