pytorch
ba3962f5 - [Onnxifi] Warmup cache of output shapes (#48346)

Commit View On GitHub

Commit

3 years ago

[Onnxifi] Warmup cache of output shapes (#48346) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48346 Onnxifi now accepts output shape info for all possible batch sizes. This is used to avoid doing shape inference inside `OnnxifiOp::extractOutputBatchSizes()`. FB: In this diff we try to pre-calculate output shapes for all possible batch sizes inside `PredictorContainer` where we supposedly have enough data to do so. This data is then passed down to OnnxifiOp. Here is the dependency graph that I built manually trying to understand the entire flow. https://pxl.cl/1rQRv Test Plan: Strobelight data https://fburl.com/strobelight/jlhhgt21 shows that `OnnxifiOp::RunOnDevice()` now takes only 2.17% of CPU instead of ~20% CPU with the current implementation. Also, the current implementation takes dozens of milliseconds according to ipiszy: > After adding more logs, I found each shapeinference call actually takes 40~50ms. I also added added time measurements temporarily for `OnnxifiOp::extractOutputBatchSizes()`. New impenentation typically consumes 1 to 4 microseconds, and, when data for current bs is not present yet in `output_reshape_info_`, it takes 20-40 microseconds which is still much better than the current implementation. AF canary https://www.internalfb.com/intern/ads/canary/431357944274985799 AI canary https://www.internalfb.com/intern/ads/canary/431365503038313840 Verifying using test tier https://pxl.cl/1sZ4S Reviewed By: yinghai, ipiszy Differential Revision: D25047110 fbshipit-source-id: 872dc1578a1e8e7c3ade5f5e2711e77ba290a671

Author

khabinov

Committer

facebook-github-bot

Parents

0a42003f

pytorch ba3962f5 - [Onnxifi] Warmup cache of output shapes (#48346)

Commit

pytorch
ba3962f5 - [Onnxifi] Warmup cache of output shapes (#48346)