Squashed commit of the following:
commit 2f8fd72e5112beb24082c252f8aa5e621bb10129
Author: Simon <80467011+sorgfresser@users.noreply.github.com>
Date: Tue Jun 10 13:50:34 2025 +0100
Remove device_count (#3587)
commit d2e6b0313d696be62fe69d19f15bf3098effbad2
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date: Tue Jun 10 05:26:48 2025 -0700
[FSDP2] Refactor + FP8 (#3585)
* Fix double wrap
* Clocking off, ~equal to torch baseline
* works?
* Working version
* Partial rewrite
* FSDP2 path works
* Fix back prepare
* Almost done, proper AC left
* Feat: should work, cleanup + test more benchmarks left
* Style+quality
* Feat: fp8 example
* Feat: better example
* Feat: add readme
* Docs + should be done
* Fix: typos
* Fix: protect imports
* Feat: address comments
* Feat: add flops image
commit b9fee48c85dc8b3c4db1e97258925660cdc6ee36
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date: Tue Jun 10 13:24:43 2025 +0100
better handle FP8 with and without deepspeed (#3611)
* use the state mixed precision which has undergone all preprocessing
* Update src/accelerate/accelerator.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/accelerate/accelerator.py
* accelerator state sets the mixed precision for deepspeed and fp8_enabled
* fix
* fix
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
commit 3a82b056cf85b16976ca2760615897fe65ae5e64
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date: Tue Jun 10 11:29:59 2025 +0200
Fix bf16 training with TP (#3610)
* fix
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
commit 6b61a373a2b4e72e3f003ba2277904ee31b9f7e0
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date: Fri Jun 6 13:48:43 2025 +0100
fix deepspeed regional compilation (#3609)
commit 682691deaca2637e0a2efeaa5ebb6dd8bade8c30
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date: Tue Jun 3 12:36:56 2025 +0200
Update Gaudi Runners (#3593)
* test
* fix
* push
* in the morning
* fix backend
* run first
* set habana modules
* dynamo backend
* trigger
* remove on pr
* remove on file change
commit 791055b4848d2c11d3dfcd47faba79b524973756
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date: Tue Jun 3 12:24:20 2025 +0200
Fix: list object has no attribute keys (#3603)
commit 16bf1d89016e03f5b0d8545e9883df7fe9ab9b5f
Author: Yao Matrix <matrix.yao@intel.com>
Date: Fri May 30 23:36:34 2025 +0800
enable torchao and pippy test cases on XPU (#3599)
* enable torchao and pippy test cases on XPU
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
commit ab3c604e48619f7cd08cfac46a7c542414b6661f
Author: Yao Matrix <matrix.yao@intel.com>
Date: Fri May 30 23:23:26 2025 +0800
enable big_model_inference on xpu (#3595)
* enable big_model_inference on XPU
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix quality
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
commit 273799c85d849a1954a4f2e65767216eb37fa089
Author: Yao Matrix <matrix.yao@intel.com>
Date: Tue May 27 20:08:59 2025 +0800
enable fsdp2 benchmark on XPU (#3590)
* enable fsdp2 benchmark on XPU
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* add deterministic
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
commit 43526c5c089cc831530f42bbbe66a0cb0b0ea461
Author: Yao Matrix <matrix.yao@intel.com>
Date: Tue May 27 17:44:50 2025 +0800
add device-agnostic GradScaler (#3588)
* add device-agnostic GradScaler
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix bug
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix review comments
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* fix
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* format
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* Apply style fixes
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
commit 07f2392f40a92710b4fb7e51b2de1d40f24d44e2
Author: Yao Matrix <matrix.yao@intel.com>
Date: Tue May 27 17:17:18 2025 +0800
change to use torch.device (#3594)
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
commit ee2f48c2c3d393187408a0f2cce1ece973033809
Author: Fanli Lin <fanli.lin@intel.com>
Date: Tue May 27 17:16:42 2025 +0800
[docs] no hard-coded cuda in the ddp documentation (#3589)
* make device-agnostic
* refactor
commit 4f3abb73a722f6275197c060346dd2f385039afc
Author: jiqing-feng <jiqing.feng@intel.com>
Date: Mon May 26 21:55:10 2025 +0800
Set ccl and KMP param in simple launch (#3575)
* Even 1 CPU mechine can also run multi process
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix ccl and kml param setting
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* set master addr only when processes > 1
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix num process check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix ccl args check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
commit db536cbfeb61a92e642462a436b51104ab96bd2f
Author: Yuanzhou Cai <80858000+yuanjua@users.noreply.github.com>
Date: Mon May 26 21:08:13 2025 +0800
Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)
* Fix tracker initialize distributed before InitProcessGroupKwargs
* Fix tracker initialize distributed before InitProcessGroupKwargs
* Add test for bug #3550
* Improve test for #3550
* Remove redundant code
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix style
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
commit 4e9d0deba6fd759f5f503f9b1587e79c51032a68
Author: Yao Matrix <matrix.yao@intel.com>
Date: Mon May 26 21:05:42 2025 +0800
enable regional_compilation benchmark on xpu (#3592)
* enable regional_compilation benchmark on xpu
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
* Apply style fixes
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
commit 8cb3ace89485af0488d93da6c080c36319cced9e
Author: Luiz F. G. dos Santos <luiz.fernando0992@gmail.com>
Date: Thu May 22 10:21:54 2025 -0500
Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)
* Added artifacts and figure tracking at MLFlow tracker
* Added `log_artifact` to the MLFlowTracker
* Remove changes
* Added kwargs when loading state.
* added doc string
* Adjusted correct default types of kwargs
* Changed the load kwargs to a single one
* removed None value from kwargs
* fix kwargs for loading the model
* removed load_kwargs from optimizer state dict
* make load_kwargs a dictionary
* revert last changes
* reverted load_kwargs
* fix docstring
* added dict initiation
* Fix quality error during PR
commit b6d97cb856ae0c9daa310ab8305340950ea8763a
Author: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Date: Thu May 22 17:26:31 2025 +0300
Resolve logger warnings (#3582)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
commit 33967d4733ec5bf402d85462ec2bbbcd8e872ea9
Author: Francesco Laiti <25352428+laitifranz@users.noreply.github.com>
Date: Tue May 20 12:29:53 2025 +0200
Add support for standalone mode when default port is occupied on single node (#3576)
* add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection
* address review feedback: warn on port conflict only for single-node; raise error for multi-node
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
commit 5b1fcda371b049f76e1bd8536e114635d9eaf5b3
Author: Yao Matrix <matrix.yao@intel.com>
Date: Tue May 20 18:04:24 2025 +0800
enable test_cli & test_example cases on XPU (#3578)
* enable test_cli & test_example cases on XPU
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* remove print
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix ci issue
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
commit f55f0533b5726d85a62fb05760ec6a92d00e0099
Author: Yao Matrix <matrix.yao@intel.com>
Date: Tue May 20 18:02:14 2025 +0800
goodbye torch_ccl (#3580)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
commit 1ec99f0b5842f2f246b6481248099920e74f6384
Author: Yao Matrix <yaoweifeng0301@126.com>
Date: Mon May 19 17:27:40 2025 +0800
enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)
* enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* Update test_load_checkpoint_and_dispatch_with_broadcast.py
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>