Update form pretrained to make TP a first class citizen (#36335)
* clean code
* oups
* fix merge
* yups
* fix if
* now you can play
* fix shape issue
* try non blocking
* fix
* updates
* up
* updates
* fix most of thetests
* update
* update
* small updates
* up
* fix the remaining bug?
* update
* rename when you read from the file
* buffer issues
* current status
* cleanup
* properly allocate dumb memory
* update a small bug
* fix colwise rep issue
* fix keep in float 32 that was keeping everything in float 32
* typo
* more fixes with keep_in_fp32_modules as we use to serach on it
* fix ROPE dtype for TP
* remove what's breaking the tests
* updates
* update and fixes
* small cleanup after merging
* allocate 2x to be safe
* style, auto
* update
* yup nit
* fix
* remove slow as fuck torch api :(
* work
* fixup
* update
* brting the fix back
* fix and update
* fixes
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* updates because some suggestions were wrong :eyes:
* update?
* fuck this bloated function
* typo
* fix the dumb prefix thing once and forall
* fixes here and there
* updates
* remove prints
* fix strict cases
* styel
* properly fix keys on load!
* update
* fix base model prefix issue
* style
* update
* fix all?
* remoce 1 print
* fix the final etsts
* fixup
* last nits
* fix the detach issue which cause a 2x slowdown
* fixup
* small fixes
* ultra nit
* fix
* fix
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>