pytorch
21fabae4 - Remove expensive call to PyObject_GetAttrString in PyTorch_LookupSpecial (#44684)

Commit View On GitHub

Commit

3 years ago

Remove expensive call to PyObject_GetAttrString in PyTorch_LookupSpecial (#44684) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44684 The ad-hoc quantization benchmarking script in D23689062 recently highlighted that quantized ops were surprisingly slow after the introduction of support for custom ops in torch.fx in D23203204 (https://github.com/pytorch/pytorch/commit/f15e27265ff76f49844b0ccc6ca387cb564824bf). Using strobelight, it's immediately clear that up to 66% of samples were seen in `c10::get_backtrace`, which is descends from `torch::is_tensor_and_apppend_overloaded -> torch::check_has_torch_function -> torch::PyTorch_LookupSpecial -> PyObject_HasAttrString -> PyObject_GetAttrString`. I'm no expert by any means so please correct any/all misinterpretation, but it appears that: - `check_has_torch_function` only needs to return a bool - `PyTorch_LookupSpecial` should return `NULL` if a matching method is not found on the object - in the impl of `PyTorch_LookupSpecial` the return value from `PyObject_HasAttrString` only serves as a bool to return early, but ultimately ends up invoking `PyObject_GetAttrString`, which raises, spawning the generation of a backtrace - `PyObject_FastGetAttrString` returns `NULL` (stolen ref to an empty py::object if the if/else if isn't hit) if the method is not found, anyway, so it could be used singularly instead of invoking both `GetAttrString` and `FastGetAttrString` - D23203204 (https://github.com/pytorch/pytorch/commit/f15e27265ff76f49844b0ccc6ca387cb564824bf) compounded (but maybe not directly caused) the problem by increasing the number of invocations so, removing it in this diff and seeing how many things break :) before: strobelight: see internal section output from D23689062 script: ``` $ ./buck-out/gen/scripts/v/test_pt_quant_perf.par Sequential( (0): Quantize(scale=tensor([0.0241]), zero_point=tensor([60]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.017489388585090637, zero_point=68, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.010896682739257812 q 0.11908197402954102 ``` after: strobelight: see internal section output from D23689062 script: ``` $ ./buck-out/gen/scripts/v/test_pt_quant_perf.par Sequential( (0): Quantize(scale=tensor([0.0247]), zero_point=tensor([46]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.012683945707976818, zero_point=41, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.011141300201416016 q 0.022639036178588867 ``` which roughly restores original performance seen in P142370729 UPDATE: 9/22 mode/opt benchmarks ``` buck run //scripts/x:test_pt_quant_perf mode/opt Sequential( (0): Quantize(scale=tensor([0.0263]), zero_point=tensor([82]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.021224206313490868, zero_point=50, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.002968311309814453 q 0.5138928890228271 ``` with patch: ``` buck run //scripts/x:test_pt_quant_perf mode/opt Sequential( (0): Quantize(scale=tensor([0.0323]), zero_point=tensor([70]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.017184294760227203, zero_point=61, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.0026655197143554688 q 0.0064449310302734375 ``` Reviewed By: ezyang Differential Revision: D23697334 fbshipit-source-id: f756d744688615e01c94bf5c48c425747458fb33

Author

bradleyhd

Committer

facebook-github-bot

Parents

99242eca

pytorch 21fabae4 - Remove expensive call to PyObject_GetAttrString in PyTorch_LookupSpecial (#44684)

Commit

pytorch
21fabae4 - Remove expensive call to PyObject_GetAttrString in PyTorch_LookupSpecial (#44684)