match batchmatmul on 1.0.0.6 (#43559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43559
- remove mkl strided gemm since it was acting weird in some cases, use the plain for loop for gemm for now, it will have performance implications but this closes the gap for the ctr_instagram_5x model
- reproduced the failure scenario of batchmatmul on ctr_instagram_5x by increasing the dimensions of the inputs
- added an option in netrunner to skip bmm if needed
Test Plan:
- net runner passes with ctr_instagram 5x
- bmm unit test repros the discrepancy fixed
Reviewed By: amylittleyang
Differential Revision: D23320857
fbshipit-source-id: 7d5cfb23c1b0d684e1ef766f1c1cd47bb86c9757