Make Nd tensors hit fused addmm pass (#106911)
Replace https://github.com/pytorch/pytorch/pull/106433 since I had a bad cla commit.
Speeds up eager convnext bfloat16 inference by 35%., and eager timm bfloat16 inference average by `.5%`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106911
Approved by: https://github.com/ezyang