fix: BF16_Optimizer selection and compatibility issues
Several bugs were causing test_bf16_optimizer_fragments to fail:
1. DDP_BFLOAT16 constant collision with BFLOAT16
- Both were set to "bf16", causing BF16_Optimizer to never be selected
- Changed DDP_BFLOAT16 to "ddp_bf16" to differentiate
2. Missing attributes in BF16_Optimizer
- Added custom_loss_scaler, external_loss_scale, torch_autocast_gradscaler
- These are required by base_optimizer.py's needs_scaler() and scale_if_loss()
3. scale_if_loss() assumed loss_scaler always exists
- Added hasattr check before calling loss_scaler.scale_loss()
4. Test config missing grad_accum_dtype
- Added data_types.grad_accum_dtype=fp32 to ensure BF16_Optimizer is used
- Without this, FP16_Optimizer is used which doesn't support tensor fragment APIs
5. Added DS_DISABLE_REUSE_DIST_ENV support in tests/unit/common.py
- Allows disabling reuse_dist_env via environment variable for CI
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>