[dtensor] skip pytree when not necessary (#110132)
pytree is a great tool, but it sometimes considers to be evil for
tensor subclasses, it's useful to implement subclass quickly, but it:
* exposes non-trival CPU overhead
* many ops don't need pytree, only the one with list/dict ops needs
* blindly use pytree to re-wrap have semantic issues for inplace/out
ops
This PR avoid using pytree for most ops during torch_dispatch and only
enable it for certain ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110132
Approved by: https://github.com/fduwjj