DeepSpeed
1393f75d - Fix issue with BF16 optimizer selection (#7788)

Commit
36 days ago
Fix issue with BF16 optimizer selection (#7788) **Note:** Updated based on the change 64b10739a66704afc4112d10ab2d70f2b3a2266c for #7790. With the fix, `BF16_Optimizer` now requires ZeRO stage 1 to be explicitly enabled. The test `test_bf16_optimizer_fragments` fails with an `AssertionError` because the `BF16_Optimizer` is not being instantiated when expected. The test checks for `_hp_mapping` attribute on parameters, which is only set by `BF16_Optimizer`. The test `test_bf16_optimizer_fragments` fails because: 1. The test config (`bf16=True` without grad_accum_dtype) **correctly** uses `FP16_Optimizer`, but the test expects `BF16_Optimizer` (which sets `_hp_mapping`) 2. `BFLOAT16` and `DDP_BFLOAT16` have the same value `"bf16"`, preventing proper optimizer selection 3. `BF16_Optimizer` is missing attributes required by the base class API This PR addresses these issues. Optimizer selection summary: | ZeRO Stage | Config | Optimizer | Gradient Accumulation | |------------|--------|-----------|----------------------| | 0 | `bf16=True` (default) | `FP16_Optimizer` | bf16 | | 0 | `bf16=True` + `grad_accum_dtype=fp32` | `NotImplementedError` | - | | 1 | `bf16=True` + `grad_accum_dtype=fp32` | `BF16_Optimizer` | fp32 | This is confusing (e.g., `FP16_Optimizer` handles both fp16 and bf16). We would need to simplify the code paths and clarify the behaviors in the future. Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Author
Parents
Loading