fix at::from_blob_quantized_per_tensor_affine strides calculation (#79314)
Summary: There seems to be a off-by-one bug in `at::from_blob_quantized_per_tensor_affine`. For an input with sizes {N, C, H, W}, before strides would be calculated as {NCH, CH, H, 1}, now strides is calculated as {CHW, HW, W, 1}. The updated unit test catches this problem.
Test Plan:
```
buck test mode/dev-nosan //caffe2:quantized_test
```
before fix:
```
✓ ListingSuccess: caffe2:quantized_test : 9 tests discovered (15.632)
✓ Pass: caffe2:quantized_test - TestQTensor.QuantDequantAPIs (0.004)
✗ Fail: caffe2:quantized_test - TestQTensor.FromBlobQuantizedPerTensor (0.002)
Test output:
> caffe2/aten/src/ATen/test/quantized_test.cpp:247
Expected equality of these values:
qtensor[h][w].item<float>()
Which is: -0.5
(custom_data[i] - zero_point) * scale
Which is: 0
stdout: Note: Google Test filter = TestQTensor.FromBlobQuantizedPerTensor
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestQTensor
[ RUN ] TestQTensor.FromBlobQuantizedPerTensor
caffe2/aten/src/ATen/test/quantized_test.cpp:247: Failure
Expected equality of these values:
qtensor[h][w].item<float>()
Which is: -0.5
(custom_data[i] - zero_point) * scale
Which is: 0
[ FAILED ] TestQTensor.FromBlobQuantizedPerTensor (2 ms)
[----------] 1 test from TestQTensor (2 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (2 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] TestQTensor.FromBlobQuantizedPerTensor
1 FAILED TEST
```
after fix:
```
✓ ListingSuccess: caffe2:quantized_test : 9 tests discovered (16.051)
✓ Pass: caffe2:quantized_test - TestQTensor.RoundingMode (0.002)
✓ Pass: caffe2:quantized_test - TestQTensor.QuantizePerChannel4dChannelsLast (0.217)
✓ Pass: caffe2:quantized_test - TestQTensor.QuantDequantAPIs (0.003)
✓ Pass: caffe2:quantized_test - TestQTensor.EmptyPerchannelQuantized (0.003)
✓ Pass: caffe2:quantized_test - TestQTensor.EmptyQuantized (0.002)
✓ Pass: caffe2:quantized_test - TestQTensor.FromBlobQuantizedPerChannel (0.004)
✓ Pass: caffe2:quantized_test - TestQTensor.QuantizePerChannel4d (0.005)
✓ Pass: caffe2:quantized_test - TestQTensor.FromBlobQuantizedPerTensor (0.003)
✓ Pass: caffe2:quantized_test - TestQTensor.Item (0.002)
Summary
Pass: 9
ListingSuccess: 1
```
Differential Revision: D37061355
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79314
Approved by: https://github.com/jerryzh168