pytorch
74849d91 - [acc_shape_inference] add shape inference for quantize_per_channel (#66562)

Commit

3 years ago

[acc_shape_inference] add shape inference for quantize_per_channel (#66562) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66562 Adding shape inference for `acc_ops.quantize_per_channel`, and fixing some bugs. Bugs were related to the fact that `quantize_per_channel` arguments `scales` and `zero_points` take tensors, so when we fetch the values (which needs to be done using `.tolist()` instead of `.item()`) we may get either a list or a scalar value. Test Plan: # Test Quantized Resnet From sandbox with GPU that supports quantized types (tested with V100) `buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test` Output ``` ... [TensorRT] INFO: [MemUsageSnapshot] Builder end: CPU 0 MiB, GPU 1548 MiB [TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation begin: CPU 0 MiB, GPU 1548 MiB [TensorRT] VERBOSE: Using cublasLt a tactic source [TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0 [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 0, GPU 1556 (MiB) [TensorRT] VERBOSE: Using cuDNN as a tactic source [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 0, GPU 1564 (MiB) [TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [TensorRT] VERBOSE: Total per-runner device memory is 23405056 [TensorRT] VERBOSE: Total per-runner host memory is 73760 [TensorRT] VERBOSE: Allocated activation device memory of size 154140672 [TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation end: CPU 0 MiB, GPU 1736 MiB trt fp16 time (ms/iter) 1.252899169921875 trt int8 time (ms/iter) 1.3774776458740234 trt implicit int8 time (ms/iter) 1.3835883140563965 PyTorch time (CUDA) (ms/iter) 4.34483528137207 PyTorch time (CPU) (ms/iter) 55.687150955200195 [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1918 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1866 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1738 (MiB) WARNING: Logging before InitGoogleLogging() is written to STDERR W1012 12:07:23.556475 711816 DynoConfigLoader.cpp:32] Failed to read config: No dyno config client ``` # Test shape inference `buck test mode/opt glow/fb/fx/acc_tracer:test_acc_shape_inference` Output ``` ... Summary Pass: 95 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1407375092088240 ``` Reviewed By: jfix71, jerryzh168 Differential Revision: D31457323 fbshipit-source-id: 8ccc4a9b0ca655fb30838e88575aff2bf3a387a6

Author

alexbeloi

Committer

facebook-github-bot

Parents

7d9bbd35

pytorch 74849d91 - [acc_shape_inference] add shape inference for quantize_per_channel (#66562)

pytorch
74849d91 - [acc_shape_inference] add shape inference for quantize_per_channel (#66562)