(Part 1) fix: make TP training compatible with new transformers (#3457)
* feat: support new tp refactor for training
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
* fix: @S1ro1 review cmt
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
* fix: @S1ro1 review cmt - tp_plan flag docstr
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
* fix: @SunMarc review cmt on un used flag
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
* fix: pick approach 3 as discussed in the PR
see https://github.com/huggingface/accelerate/pull/3457#discussion_r2037909077 for more details
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
* fix: styling errors
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
* fix: bump up transformers for tp_size feature
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
---------
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>