[pytorch] reduce memory footprint in fused conv QAT ops (#35002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35002
I was running into some memory issues once I enabled QAT and I found some opportunities to in-place operations. In particular, looks like we can do the ReLUs in-place and the bias addition seems to also work inline. The multiplication operation right above the bias addition is *not* eligible because there's a bifurcation to produce conv_orig.
Reviewed By: jerryzh168
Differential Revision: D20523080
fbshipit-source-id: 4a94047dee0136f4014a328374896b28f561e41f