Feat: save_pretrained for tensor parallel (and other parallelisms) models (#37919)
* tmp: initial save pretrained with dtensors
* Feat: add correctness tests
* Refactor: version checks
* Temp: 1:1 checkpoint llama4
* refactor
* Tests
* Feat: works
* Style
* Feat: version checks + minor fixes
* Style
* Fix: version checks in tests
* Feat: move more stuff into tensor_parallel.py