pytorch
c436426b - [fbgemm] fix gconv + acc16 (#59541)

Commit
4 years ago
[fbgemm] fix gconv + acc16 (#59541) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59541 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/621 Fixing 2 issues. These are actually 2 independent issues one in Caffe2 and another in FBGEMM, so no need to wait until FBGEMM is synchronized with PyTorch 1) conv 16-bit accumulation doesn't support fast gconv path, so TakeGConvFastPath_ should honor it 2) packed_index_ generates indices up to (G/GTogether_) F R S OC_per_G GTogether_ paddedICPerG which can exceed G kernel_prod OC_per_G paddedICPerG allocated in PackWeightMatrixForGConv (kernel_prod = F R S): e.g., when G=3, GTogether_=2, we allocate 3 F R S OC_per_G paddedICPerG but we access up to 2 F R S OC_per_G 2 paddedICPerG BTW, not sure how we haven't known about this issue for so long. Any idea will be really appreciated. Test Plan: In a BDW machine, buck test //caffe2/caffe2/quantization/server:conv_groupwise_dnnlowp_acc16_op_test -- --run-disabled Reviewed By: dskhudia Differential Revision: D28927214 fbshipit-source-id: 3ec98ea2fc177545392a0148daca592d80f40ad3
Author
Parents
Loading