[MPS] Improve the performance of torch.linear() (#91114)
* Clean up redundant headers and namespaces from Linear.mm
* This should improve the Bert sample in #77799 by ~3x
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91114
Approved by: https://github.com/DenisVieriu97, https://github.com/malfet, https://github.com/kulinseth