port fuse_parallel_linear (without changing weights) to PT2 pre-grad (#121617)
Summary: Does not change weights structure so compatible with const folding and realtime weights update
Test Plan: run added test cases
Reviewed By: frank-wei
Differential Revision: D53843428
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121617
Approved by: https://github.com/frank-wei