DeepSpeed
[CPU] Support Intel CPU inference
#3041
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
179
Changes
View On
GitHub
[CPU] Support Intel CPU inference
#3041
tjruwase
merged 179 commits into
deepspeedai:master
from
delock:gma/cpu_support
add fallback path for kernels used in megatron
edf1c128
temporary numactl WA for SPR 56core
9a89405d
adapt core allocation according to number of ranks
d1b8f137
add switch to turn on numactl
e31439ed
detect number of cores on the system
c5828f75
allow select a subset of the cores on the system to bind
6b9dcd2f
Merge branch 'up-master' into gma/numactl
4031a6e7
remove unneeded changes
893c18de
Merge branch 'up-master' into gma/bf16_kernel
ad712333
Merge branch 'master' into gma/numactl
3551850d
Merge branch 'gma/numactl' into gma/cpu_support
04d17e85
add ccl backend
1369eda2
change nccl to ccl
3c927c7e
remove unused code
e1eecd2b
add comm/ccl to ops
c0053991
initial ccl comm support
b7d455e6
first broadcast case passed
6f2a73ea
add CCL_Backend to DeepSpeed
2c012fdd
support comm timer for CPU
3435185e
support barrier for comm backend
92cc50ec
support specify master address from deepspeed command line
62c53f75
support pytorch 2.0
9ce6fce7
remove 'block' from api
f4e1d3cd
Tweak for debug
1e583faf
Remove unecessary directory
a363f019
Add bf16 kernel support for inference
bb29b1ae
Add temporary torch implement for cpu inference
5e071744
Add softmax ops cpu fallback for inference
076d6992
bind cores to numa domain as well
0a35d9c8
Merge branch 'up-master' into gma/cpu_support
c1d78b92
merge latest change in gma/numactl
620c81ca
initial bf16 kernel support with fallback path
1b138ba8
initial fallback path for bloom kernel injection
a3b663c6
Merge branch 'zhong_master' into gma/cpu_support
6ca0f0d9
fix softmax attn mask
4323a909
check KMP_AFFINITY to avoid conflict with numactl
cc6df65c
New CCLBackend which utilize TorchBackend for initialization
9e503a54
rollback last change because there is result error
9a38b28e
fix bloom injection policy TP could not work issue.
a2d9c9e6
Merge pull request #3 from sywangyi/yi_bloom_dev
705de700
Use TorchBackend to initialize CCLBackend, make behavior consistent
1f218132
remove comm under deepspeed/ops
df4b0d77
Merge branch 'microsoft:master' into gma/cpu_support
ad842191
add license header
7702ba7e
code clean up
e5ff34c4
fix format issue
adb1adfc
remove magic number in main address
8ad2dbca
add caching support but not turn on by default
ce5830e5
change name of inference_cuda_module to inference_module
da9053fb
Merge branch 'master' into gma/cpu_support
626a9bfc
Check for is_synchronized_device in accelerator before get Event
e79ae9fa
rogerxfeng8
commented on 2023-03-20
fix typo
9e41f21b
Fix fallback path of softmax kernel on CUDA device for BF16 data type…
c6755471
add cpu backend files
d42df022
change CPU_Accelerator op_builder_dir
9fca9bee
remove cpu_kernel_path
e797502b
using CPU_Accelerator on non-cuda device
1fddaf91
fix deepspeed.op_builder => deepspeed.ops.op_builder
43a7aaef
add alias for num_gpus: num_accelerators
ad4d39a3
Merge branch 'gma/cpu_support_add_backend' into gma/cpu_support
705c5196
allow loading cpu_builder in build stage
e685e98e
Assume cuda available if torch not installed
a4e76e7a
add oneccl_binding_pt to requirements
71a6f477
move oneccl-binding-pt to seperate requiremetns-cpu.txt
8faae833
add missing file
7997887a
use dependency_links in setuptools.setup() call for additional depend…
a027af6a
Merge branch 'master' into gma/cpu_support
4742e23d
install oneccl_bind_pt in workflows
62772ead
change oneccl_bind_pt's version from 1.13 to 2.0
a4499d65
use intel_exention_for_pytorch as indicator that CPU_Accelerator shou…
fe63c7b0
Add indicator for Accelerator used
39052647
change foo.c to foo.cpp
f7ead2d9
exclude 'cpu' directory in CUDA op builder reflection
d4e63b9e
Merge branch 'master' into gma/cpu_support
a34b8863
add a cpu-inference workflow
d0e5b1cb
run cpu-inference workflow on self-hosted instance
221896c2
change cpu runs-on node to v100 node
a039688b
print out python version in workflow
5525ce0f
add verbose in pip command to understand oneccl_bind_pt install issue
3ddc8b2d
update cpu-inference workflow
fa9d345e
add a stage to detect instance instruction sets
978314ac
add back bf16 support for CPU inference
c5b11ad0
enable autoTP for bloom
312d85c6
update workflow to detect cpu instruction sets
a66d8b56
temporary WA for Intel Extension for PyTorch AVX2 instructioon set de…
04478237
change cpu-inference workflow machine to ubuntu-20.04
a09c638d
add sharded checkpoint loading for AutoTP path to reduce the peak mem…
33803ee8
Merge pull request #6 from sywangyi/autoTP_reduce_peak
2deff633
enable policy for llama
755a47b2
Merge branch 'master' into gma/cpu_support
bd021b5e
use a special build ipex to test avx2 detection fix
21f9c286
Merge pull request #7 from jianan-gu/patch-1
c177c2e4
Merge branch 'up-master' into gma/cpu_support
4e4a367d
delock
commented on 2023-03-28
fix format
db1f564b
Merge branch 'master' into gma/cpu_support
20c79e17
fix test fail issue
f66195ee
Merge pull request #8 from sywangyi/yi_dev
80908670
fix gptj sharded checkpoint loading problem
f65030b6
Merge pull request #9 from sywangyi/yi_dev_gptj_shard
08284cc7
Merge branch 'up-master' into gma/cpu_support
2c4f209f
return a not implemented build in get_op_builder in cpu_backend
f83a1303
Merge branch 'master' into gma/cpu_support
e59bf322
support cpu device in tests
302e5b0b
use cpuinfo to extract number of CPUs
27bcd042
use ~/tmp as transfomer cache rather than /blob/
aafa40eb
delock
marked this pull request as ready for review
2 years ago
delock
requested a review
from
RezaYazdaniAminabadi
2 years ago
delock
requested a review
from
jeffra
2 years ago
delock
requested a review
from
mrwyattii
2 years ago
delock
requested a review
from
awan-10
2 years ago
delock
requested a review
from
cmikeh2
2 years ago
delock
requested a review
from
arashb
2 years ago
delock
requested a review
from
tjruwase
2 years ago
delock
requested a review
from
loadams
2 years ago
Add support for mpich launcher with prefer_deepspeed_comm
8e902702
add missing modification in accelerator
d70448c9
delock
commented on 2023-04-03
delock
commented on 2023-04-03
delock
commented on 2023-04-03
enable IMPI launcher
16e4504c
Merge branch 'up-master' into gma/cpu_support
f5fd3129
remove unused file and fix formatting
c85cbe4d
delock
commented on 2023-04-04
delock
commented on 2023-04-04
delock
commented on 2023-04-04
delock
commented on 2023-04-04
delock
commented on 2023-04-04
delock
commented on 2023-04-04
Merge branch 'up-master' into gma/cpu_support
bc0b1522
clean up ccl.cpp
71ad82e5
Merge branch 'master' into gma/cpu_support
5e816921
Merge branch 'master' into gma/cpu_support
fc7a4fa5
Less confusing error message when certin op builder are not implemented
f04d83d3
Fix license header
fa83d5a6
Add license header
5b18bbc1
add license headers
900c0074
add license header
57d790d2
fix cuda specific code in test
d001568e
update CPU workflow
07ce43bc
Merge branch 'up-master' into gma/cpu_support
e21b0730
delock
commented on 2023-04-13
use numactl to bind to core
cb6d6f5a
allow bind_cores_to_rank in multi-node impi runner
7a8a8efd
fix format error
c63bf454
Remove InferenceBuilder
8f5e51af
fix format error in numa.py
7fd07384
check whether op is in installed ops in ds_report.py
e21a04e7
Merge branch 'master' into gma/cpu_support
cd45a4f1
Merge branch 'master' into gma/cpu_support
a3c5da49
delock
commented on 2023-04-19
allow override accelerator with DS_ACCELERATOR='cuda','cpu' or 'xpu'
9d0a47a2
lazy init class_dict in CUDA_Accelerator to avoid cyclic initializati…
72dcfc6d
Merge branch 'master' into gma/cpu_support
a71a4620
put short path in the beginning in real_accelerator.py
1e0aeaab
delock
commented on 2023-04-21
device_count return number of NUMA nodes
880c466b
Merge branch 'up-master' into gma/cpu_support
47da0791
fix typo
69ee9d28
install numactl in cpu workflow
cd8d810f
Merge branch 'up-master' into gma/cpu_support
9d1eec2d
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
jeffra
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
tjruwase
commented on 2023-04-25
Merge branch 'master' into gma/cpu_support
0b0eadb6
Follow comments
a0ebaad7
Merge branch 'master' into gma/cpu_support
3c0ece38
Better implementation of device_count() and current_device()
248c47be
remove dependency_link for Intel Extension for DeepSpeed
02ce81f2
use check is_synchronized_device in timer only once
9babf032
remove env mapping WA in cpu_accelerator
59dca3f3
Merge branch 'master' into gma/cpu_support
0230c77f
Merge branch 'up-master' into gma/cpu_support
04d4a27a
fix duplicate definition
7aecab72
fix format error
5cc61fbf
delock
commented on 2023-05-04
Merge branch 'master' into gma/cpu_support
1bd6458c
tjruwase
commented on 2023-05-04
refine ccl backend selection
4027838e
move comments to the right place
e4e92e0a
tjruwase
commented on 2023-05-05
Merge branch 'up-master' into gma/cpu_support
31e17853
remove prefer_deepspeed_comm, use CCLBackend by default
50faa03f
refractor fallback path
5d26c8e9
Fix execution failure in kernel injection path
5957f56e
do not refractory kernel injection fallback path in residual_add bec…
f7dd9402
guard residual_add fallback path with environ DS_KI_FALLBACK=True
816e3548
Merge branch 'master' into gma/cpu_support
803b17f7
Merge branch 'master' into gma/cpu_support
2b59496c
fix format error
764d027d
add test for allreduce on CPU workflow
869724c4
fix format error
f678b901
Merge branch 'master' into gma/cpu_support
828dce07
tjruwase
approved these changes on 2023-05-10
Merge branch 'up-master' into gma/cpu_support
8c3444b2
Fallback to TorchBackend if CCLBackend kernel are not implemented
1b9cd906
Merge branch 'master' into gma/cpu_support
bf5657f8
Update Intel Extension for Pytorch installation link
c6c6eb2c
Don't specify version number of Intel Extension for PyTorch
e7fba930
install oneCCL for CCLBackend
b2c7cb7f
fix link path for CPU comm kernels
87493669
fix source oneCCL environment
782a624b
source oneCCL env before run UT
06f6815d
Merge branch 'master' into gma/cpu_support
24a5a526
Give more specific instruction when CCL_ROOT not defined
f63cbbda
Merge branch 'master' into gma/cpu_support
06b55a03
Merge branch 'master' into gma/cpu_support
1fc50be4
Merge branch 'master' into gma/cpu_support
5f496860
tjruwase
merged
1f72082f
into master
2 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
tjruwase
jeffra
sywangyi
rogerxfeng8
RezaYazdaniAminabadi
mrwyattii
awan-10
cmikeh2
arashb
loadams
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub