DeepSpeed
Make op builder detection adapt to accelerator change
#5206
Merged

Make op builder detection adapt to accelerator change #5206

delock
delock update nv-inference.yml for launch time op builder detection validation
238bd1cb
delock change accelerator detection logic
4382ebee
delock fallback to gloo when oneccl_binding_for_pytorch is not installed
cc21d463
delock add a workflow to test opbuilder-update
b746322d
delock remove triton from dependenc
a9aab4dd
delock remove opbuild-update and change nv-inference.yml only
53fc44a8
delock delock requested a review from mrwyattii mrwyattii 1 year ago
delock delock requested a review from loadams loadams 1 year ago
delock
delock make installed_ops check accelerator name consistency
e5533baa
delock delock requested a review from awan-10 awan-10 1 year ago
delock delock requested a review from arashb arashb 1 year ago
delock delock requested a review from tjruwase tjruwase 1 year ago
tjruwase Merge branch 'master' into gma/launch_opbuilder_detection
ef01f0d3
tjruwase tjruwase requested a review from umchand umchand 1 year ago
tjruwase tjruwase removed review request from arashb arashb 1 year ago
tjruwase tjruwase removed review request from awan-10 awan-10 1 year ago
delock fix accelerator override name
22cc43cb
delock fix formatting check
1691ef3a
delock regenerate compatible ops every time
cf2ea666
delock
delock remove ipex and oneccl_pt_binding installation in cpu-inference workflow
c040105b
delock fix cpu-inference and nv-inference workflow
82591223
delock
delock add missing quotation mark
d13fe5c6
delock import ALL_OPS in git_version_info.py
ccaeb721
delock build oneCCL with parallel make -j
2b6707f8
delock adding missing package dependency
a3bc2f80
delock
delock fix cpu-inference workflow
04bd0615
tjruwase Merge branch 'master' into gma/launch_opbuilder_detection
91163027
delock
delock
tjruwase
tjruwase
delock fix cpu-inference workflow and pre-compile workflow
1a1a71b7
delock remove py-cpuinfo and psutil preinstall
87367e16
delock
tjruwase Merge branch 'master' into gma/launch_opbuilder_detection
52fc101c
delock
loadams
tjruwase
loadams
loadams Merge branch 'master' into gma/launch_opbuilder_detection
8baa89e1
loadams Merge branch 'master' into gma/launch_opbuilder_detection
0a63463c
delock Skip test when its fp16
4bba5e13
delock fix elastic test
7cd08cd6
delock Better dequantization skipping
b28d81bc
delock fix format
2ea44ba5
delock
loadams
umchand
delock add numactl into dependency
57bab576
delock
delock
delock
loadams Merge branch 'master' into gma/launch_opbuilder_detection
f06f6b62
delock Use bf16 data type for test if accelerator does not support fp16
47888ec3
delock skip more tests requires bf16
a317fe8b
delock skip more UTs
a03cc567
delock skip more tests that CPU accelerator does not support
cd916f76
delock change skip reason
c943ec23
delock skip a time out test
30d3e695
delock
delock fix test_zero
2658d415
tjruwase
delock
delock
delock Get around lazy init issue in test_ds_config_dict.py
cd8672d7
loadams Merge branch 'master' into gma/launch_opbuilder_detection
a1666ba8
umchand
delock fix more ut failures
ff5380f5
delock fix more UT failure
da808a2e
delock fix more UTs
45e146a4
delock fix more tests
2fd32b6d
delock better construct for preferred dtype
2e604626
delock fix import error
c754492d
delock remove scale for bf16 config
e024e6f3
delock pass more UTs
f55186d3
delock fix more tests
02cb9e35
delock
loadams Merge branch 'master' into gma/launch_opbuilder_detection
ae544e1d
loadams
delock change preferred_dtype into a function
46236220
delock
delock install pdsh in cpu-torch-latest.yml
43505ab1
delock
tjruwase Merge branch 'master' into gma/launch_opbuilder_detection
79c4d6c8
delock
delock
delock better test_lr_scheduler skipping
2e59d927
delock skip multinmode test
ad351e4a
delock preferred_dtype ==> preferred_dtype()
b2673dfb
delock fix more tests
41ced030
delock skip some special case
ad191718
tjruwase
tjruwase commented on 2024-03-11
tjruwase
tjruwase commented on 2024-03-11
tjruwase
delock fix error in nv-torch-latest
f4fe02b9
delock
loadams
delock fix error in test_zero_context
88567b33
delock remove "fp16" argument in checkpoint_correctness_verification
c94003b5
delock
tjruwase
tjruwase commented on 2024-03-12
tjruwase
tjruwase approved these changes on 2024-03-12
tjruwase tjruwase merged c08e69f2 into master 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone