optimize matmul memory usage for certain cases (#23433)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/21406
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23433
Differential Revision: D16524135
Pulled By: ailzhang
fbshipit-source-id: e7684fec60c9b9db9a09f8ac157b13c8dde1bdd2