pytorch
eb3bf96f - During inbatch broadcast, move Tile op after Fused8BitRowwiseQuantizedToFloat if applicable (#41464)

Commit View On GitHub

Commit

4 years ago

During inbatch broadcast, move Tile op after Fused8BitRowwiseQuantizedToFloat if applicable (#41464) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41464 If input is int8 rowwise quantized, currently we cannot low it to Glow. And previously, we had some error when running with inbatch broadcast. The main issue is that Tile op doesn't support uint8_t type, which is very easily added here. However, this will result in non-ideal situation that we will leave Tile -> Fused8BitRowwiseQuantizedToFloat on host side, which probably hurt the memory bw a lot. Even we later add the support to Fused8BitRowwiseQuantizedToFloat in Glow, it's still not ideal because we are doing redudant compute on identical columns. So the solution here is to swap the order of Fused8BitRowwiseQuantizedToFloat and Tile to make it Tile -> Fused8BitRowwiseQuantizedToFloat. In this way, it will resolve the error we saw immediately. For the short term, we can still run Tile in card. And for longer term, things runs faster on card. The optimization is a heuristic. If in the net, there isn't such pattern, inbatch broadcast will work as it was before. (Note: this ignores all push blocking failures!) Test Plan: ``` buck test caffe2/caffe2/opt/custom:in_batch_broadcast_test ``` Reviewed By: benjibc Differential Revision: D22544162 fbshipit-source-id: b6dd36a5925a9c8103b80f034e7730a7a085a6ff

Author

Yinghai Lu

Committer

facebook-github-bot

Parents

5376785a

pytorch eb3bf96f - During inbatch broadcast, move Tile op after Fused8BitRowwiseQuantizedToFloat if applicable (#41464)

Commit

pytorch
eb3bf96f - During inbatch broadcast, move Tile op after Fused8BitRowwiseQuantizedToFloat if applicable (#41464)