[PyTorch] Allocate correctly-sized output tensor in addmm_cuda (#56033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56033
There doesn't seem to be any reason not to size the output
correctly, and it avoids a round of dispatch for resize.
ghstack-source-id: 127409715
Test Plan:
Inspected GPU trace for simple nn.Linear in a loop. No more
resize operator invocation.
Existing CI should let us know if this is incorrect
Reviewed By: ngimel
Differential Revision: D27768311
fbshipit-source-id: fb48ec50f3cffc1015ef03d528e9007274b4dd3a