[Inductor] Build FX Linear + Permute Vertical Fusion in Inductor (#88859)
Summary:
Build fx-based linear/matmul/bmm + permute/transpose vertical fusion in Inductor
For an internal Ads model: **1.15x -> 1.36x speedup**
Differential Revision: D41071665
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88859
Approved by: https://github.com/jianyuh, https://github.com/jansel