Redesign the output shape adjustment of OnnxifiOp (#21027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21027
Previously, we are only able to adjust batch size when output shape has batch size conditioned at its first dim. Although not common, there are cases where we want to slice back the output whose batch size is conditioned on non-first dim, or whose output shape doesn't really has batch size in it but rather is an expression of it. Examples are shapes at the output of `Transpose` or `Tile`. This diff redesigns how we handle the output size. The key is when we run OnnxifiOp, the input shapes are given, and we can actually do a shape inference to derive the real output shapes, no matter how they got transformed. And then we compare the real output shape with max batch sized output shape, dim by dim and use a `Slice` op to cut the max output back to real output shape.
Notice that general `Slice` op is slow and in most of the cases, we still prefer adjusting batch size by shrinking its first dim, which is just an operation on meta info without data allocation/manipulation. Therefore, we add a flag `fast_path` to detect this situation and operate accordingly.
Reviewed By: tracelogfb
Differential Revision: D15515189
fbshipit-source-id: 9c1fff161f82d0bc20eeac07ca4a2756e964e9fd