[Inductor] Enable the decomposition of quant/dequant per channel (#119177)
**Summary**
Part 2 of fixing https://github.com/pytorch/pytorch/issues/119141 which needs vectorized code generation of per channel quant and int8 data type.
Enable decomposition of quant/dequant per channel to make it vectorized code generation.
**TestPlan**
```
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_uint8_bf16_input
python -u -m pytest -s -v test_cpu_repro.py -k test_per_channel_fake_quant_int8_bf16_input
```
Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119177
Approved by: https://github.com/peterbell10, https://github.com/jansel