pytorch
0d9ca21d - [Static Runtime] Native stack for contiguous inputs (#50863)

Commit

3 years ago

[Static Runtime] Native stack for contiguous inputs (#50863) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50863 - Avoid calling unsqueeze on every input tensor by copying data directly - Model benchmark shows small improvement: -2.3% (b=1), -1.1% (b=20) - This diff does not yet modify torch::stack implementation, only the static_runtime path. A followup diff will do this. Test Plan: # Test ``` buck test //caffe2/aten:native_test buck run //caffe2/test:torch ``` # Op benchmark expected no changes here because this diff only touches static runtime ``` Baseline |Native |Change 6.38 |6.336 |-0.69% 6.553 |6.588 |0.53% 14.904 |14.883 |-0.14% 5.657 |5.68 |0.41% 5.612 |5.795 |3.26% 6.051 |6.058 |0.12% 4.225 |4.252 |0.64% 4.24 |4.294 |1.27% 6.28 |4.249 |-32.34% 6.267 |4.257 |-32.07% 418.932 |404.356 |-3.48% 417.694 |404.752 |-3.10% 1592.455 |1583.277 |-0.58% 2919.261 |2685.636 |-8.00% 211.458 |202.838 |-4.08% 211.518 |203.229 |-3.92% 783.953 |792.198 |1.05% 1457.823 |1348.824 |-7.48% 2032.816 |1975.961 |-2.80% 2090.662 |2000.612 |-4.31% 6487.098 |6635.41 |2.29% 11874.702 |10853.302 |-8.60% 2123.83 |2039.272 |-3.98% 2195.453 |2221.82 |1.20% 6435.978 |6593.363 |2.45% 11852.205 |10858.92 |-8.38% 2036.526 |1983.042 |-2.63% 2055.618 |2072.03 |0.80% 6417.192 |6681.064 |4.11% 12468.744 |10888.336 |-12.67% 4959.704 |4954.734 |-0.10% 5121.823 |4996.84 |-2.44% 5082.105 |5029.652 |-1.03% 5395.936 |5438.628 |0.79% 5162.756 |5114.147 |-0.94% 23798.08 |21884.065 |-8.04% 4957.921 |4972.01 |0.28% 4971.234 |4968.977 |-0.05% 5005.909 |5039.95 |0.68% 5159.614 |5180.426 |0.40% 5013.221 |5202.684 |3.78% 20238.741 |20212.581 |-0.13% 7632.439 |7610.345 |-0.29% 7589.376 |7679.148 |1.18% 7859.937 |7850.485 |-0.12% 8214.213 |8150.846 |-0.77% 11606.562 |11724.139 |1.01% 34612.919 |34817.677 |0.59% ``` # Adindexer model benchmark ``` caffe2=0 batch={1|20} profile=1 ./scripts/bwasti/static_runtime/run.sh ``` ## Baseline ``` Batch 1 0.00291311 ms. 3.97139%. aten::stack (1 nodes) Batch 20 0.00477447 ms. 0.934081%. aten::stack (1 nodes) ``` ## Native stack (this change) ``` Batch 1 0.00115161 ms. 1.67388%. aten::stack (1 nodes) Batch 20 0.00264831 ms. 0.543767%. aten::stack (1 nodes) ``` Reviewed By: hlu1 Differential Revision: D25988638 fbshipit-source-id: 82ce84c88963cae40dc5819004baf03ce9093ecc

Author

Marat Subkhankulov

Committer

facebook-github-bot

Parents

fe67438f

pytorch 0d9ca21d - [Static Runtime] Native stack for contiguous inputs (#50863)

pytorch
0d9ca21d - [Static Runtime] Native stack for contiguous inputs (#50863)