Improve overflow handling in ZeRO (#6976)
Fix #5241: Improve overflow handling
- [x] ZeRO 1
- [x] ZeRO 2
- [ ] ZeRO 3
- [ ] BF16Optimizer
Enable pydantic configuration for mixed precision
- [x] bf16
- [x] fp16
---------
Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Xinyu Lian <lian7@illinois.edu>
Co-authored-by: loadams <loadams@users.noreply.github.com>
Co-authored-by: Omar Elayan <142979319+oelayan7@users.noreply.github.com>
Co-authored-by: Fabio Geraci <118277438+fabiosanger@users.noreply.github.com>
Co-authored-by: Sam Foreman <saforem2@gmail.com>
Co-authored-by: Fabien Dupont <fabiendupont@fabiendupont.fr>
Co-authored-by: Liangliang Ma <1906710196@qq.com>
Co-authored-by: inkcherry <mingzhi.liu@intel.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>