[Inductor][FX passes] Pre grad batch linear LHS fusion (#106497)
This is a popular pattern in many internal user cases, we have two versions (pre and post grad) and found the pre grad version has more perf gain, which makes sense in theory as this corresponding backward graph doesn't have this pattern.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106497
Approved by: https://github.com/jackiexu1992, https://github.com/jansel