[tp] fix torch compile regression (#111521)
The most recent refactor of TP
https://github.com/pytorch/pytorch/pull/111160 breaks torch compile
path, so reverting the behavior back by:
1. use the old default prepare_input/output
2. add the colwise/rowwise parallel test instead
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111521
Approved by: https://github.com/fduwjj