Port remove_split_ops to PT2 pre-grad passes (#121674)
Summary: For OEMAE, this contributes 14% of the total DPER pass perf gain.
Test Plan:
Run test cases
Run oemae lower benchmark with and with this fix. FLOP/s 29 -> 34.
Reviewed By: frank-wei
Differential Revision: D54711064
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121674
Approved by: https://github.com/frank-wei