[Inductor] Build FX Linear + Permute Vertical Fusion in Inductor (#89118)
Summary:
Build fx-based linear/matmul/bmm + permute/transpose vertical fusion in Inductor
For an internal Ads model: **1.15x -> 1.36x speedup**
Test Plan: CI
Reviewed By: bertmaher, jansel, jianyuh
Differential Revision: D41071665
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89118
Approved by: https://github.com/jianyuh