DeepSpeed
[CPU] Support Intel CPU inference
#3041
Merged

[CPU] Support Intel CPU inference #3041

tjruwase merged 179 commits into deepspeedai:master from delock:gma/cpu_support
delock
delock add fallback path for kernels used in megatron
edf1c128
delock temporary numactl WA for SPR 56core
9a89405d
delock adapt core allocation according to number of ranks
d1b8f137
delock add switch to turn on numactl
e31439ed
delock detect number of cores on the system
c5828f75
allow select a subset of the cores on the system to bind
6b9dcd2f
delock Merge branch 'up-master' into gma/numactl
4031a6e7
delock remove unneeded changes
893c18de
delock Merge branch 'up-master' into gma/bf16_kernel
ad712333
delock Merge branch 'master' into gma/numactl
3551850d
delock Merge branch 'gma/numactl' into gma/cpu_support
04d17e85
delock add ccl backend
1369eda2
delock change nccl to ccl
3c927c7e
delock remove unused code
e1eecd2b
delock add comm/ccl to ops
c0053991
delock initial ccl comm support
b7d455e6
delock first broadcast case passed
6f2a73ea
delock add CCL_Backend to DeepSpeed
2c012fdd
delock support comm timer for CPU
3435185e
delock support barrier for comm backend
92cc50ec
delock support specify master address from deepspeed command line
62c53f75
delock support pytorch 2.0
9ce6fce7
delock remove 'block' from api
f4e1d3cd
CaoZhongZ Tweak for debug
1e583faf
CaoZhongZ Remove unecessary directory
a363f019
Add bf16 kernel support for inference
bb29b1ae
Add temporary torch implement for cpu inference
5e071744
baodii Add softmax ops cpu fallback for inference
076d6992
delock bind cores to numa domain as well
0a35d9c8
delock Merge branch 'up-master' into gma/cpu_support
c1d78b92
delock merge latest change in gma/numactl
620c81ca
delock initial bf16 kernel support with fallback path
1b138ba8
delock initial fallback path for bloom kernel injection
a3b663c6
delock Merge branch 'zhong_master' into gma/cpu_support
6ca0f0d9
delock fix softmax attn mask
4323a909
delock check KMP_AFFINITY to avoid conflict with numactl
cc6df65c
delock New CCLBackend which utilize TorchBackend for initialization
9e503a54
delock rollback last change because there is result error
9a38b28e
sywangyi fix bloom injection policy TP could not work issue.
a2d9c9e6
delock Merge pull request #3 from sywangyi/yi_bloom_dev
705de700
delock Use TorchBackend to initialize CCLBackend, make behavior consistent
1f218132
delock remove comm under deepspeed/ops
df4b0d77
delock Merge branch 'microsoft:master' into gma/cpu_support
ad842191
delock add license header
7702ba7e
delock code clean up
e5ff34c4
delock fix format issue
adb1adfc
delock remove magic number in main address
8ad2dbca
delock add caching support but not turn on by default
ce5830e5
delock change name of inference_cuda_module to inference_module
da9053fb
delock Merge branch 'master' into gma/cpu_support
626a9bfc
delock Check for is_synchronized_device in accelerator before get Event
e79ae9fa
rogerxfeng8
rogerxfeng8 commented on 2023-03-20
delock fix typo
9e41f21b
delock Fix fallback path of softmax kernel on CUDA device for BF16 data type…
c6755471
delock add cpu backend files
d42df022
delock change CPU_Accelerator op_builder_dir
9fca9bee
delock remove cpu_kernel_path
e797502b
delock using CPU_Accelerator on non-cuda device
1fddaf91
delock fix deepspeed.op_builder => deepspeed.ops.op_builder
43a7aaef
delock add alias for num_gpus: num_accelerators
ad4d39a3
delock Merge branch 'gma/cpu_support_add_backend' into gma/cpu_support
705c5196
delock allow loading cpu_builder in build stage
e685e98e
delock Assume cuda available if torch not installed
a4e76e7a
delock add oneccl_binding_pt to requirements
71a6f477
delock move oneccl-binding-pt to seperate requiremetns-cpu.txt
8faae833
delock add missing file
7997887a
delock use dependency_links in setuptools.setup() call for additional depend…
a027af6a
delock Merge branch 'master' into gma/cpu_support
4742e23d
delock install oneccl_bind_pt in workflows
62772ead
delock change oneccl_bind_pt's version from 1.13 to 2.0
a4499d65
delock use intel_exention_for_pytorch as indicator that CPU_Accelerator shou…
fe63c7b0
delock Add indicator for Accelerator used
39052647
delock
delock change foo.c to foo.cpp
f7ead2d9
delock exclude 'cpu' directory in CUDA op builder reflection
d4e63b9e
delock Merge branch 'master' into gma/cpu_support
a34b8863
delock add a cpu-inference workflow
d0e5b1cb
delock run cpu-inference workflow on self-hosted instance
221896c2
delock change cpu runs-on node to v100 node
a039688b
delock print out python version in workflow
5525ce0f
delock add verbose in pip command to understand oneccl_bind_pt install issue
3ddc8b2d
delock update cpu-inference workflow
fa9d345e
tjruwase
jeffra
delock
delock add a stage to detect instance instruction sets
978314ac
delock add back bf16 support for CPU inference
c5b11ad0
sywangyi enable autoTP for bloom
312d85c6
delock update workflow to detect cpu instruction sets
a66d8b56
delock temporary WA for Intel Extension for PyTorch AVX2 instructioon set de…
04478237
delock change cpu-inference workflow machine to ubuntu-20.04
a09c638d
sywangyi add sharded checkpoint loading for AutoTP path to reduce the peak mem…
33803ee8
delock Merge pull request #6 from sywangyi/autoTP_reduce_peak
2deff633
jianan-gu enable policy for llama
755a47b2
delock Merge branch 'master' into gma/cpu_support
bd021b5e
delock use a special build ipex to test avx2 detection fix
21f9c286
delock
delock Merge pull request #7 from jianan-gu/patch-1
c177c2e4
delock Merge branch 'up-master' into gma/cpu_support
4e4a367d
delock
delock commented on 2023-03-28
delock fix format
db1f564b
delock Merge branch 'master' into gma/cpu_support
20c79e17
sywangyi fix test fail issue
f66195ee
delock Merge pull request #8 from sywangyi/yi_dev
80908670
sywangyi fix gptj sharded checkpoint loading problem
f65030b6
delock Merge pull request #9 from sywangyi/yi_dev_gptj_shard
08284cc7
delock Merge branch 'up-master' into gma/cpu_support
2c4f209f
delock return a not implemented build in get_op_builder in cpu_backend
f83a1303
delock Merge branch 'master' into gma/cpu_support
e59bf322
delock support cpu device in tests
302e5b0b
delock use cpuinfo to extract number of CPUs
27bcd042
delock use ~/tmp as transfomer cache rather than /blob/
aafa40eb
delock delock marked this pull request as ready for review 2 years ago
delock delock requested a review from RezaYazdaniAminabadi RezaYazdaniAminabadi 2 years ago
delock delock requested a review from jeffra jeffra 2 years ago
delock delock requested a review from mrwyattii mrwyattii 2 years ago
delock delock requested a review from awan-10 awan-10 2 years ago
delock delock requested a review from cmikeh2 cmikeh2 2 years ago
delock delock requested a review from arashb arashb 2 years ago
delock delock requested a review from tjruwase tjruwase 2 years ago
delock delock requested a review from loadams loadams 2 years ago
delock Add support for mpich launcher with prefer_deepspeed_comm
8e902702
delock add missing modification in accelerator
d70448c9
delock
delock commented on 2023-04-03
delock
delock commented on 2023-04-03
delock
delock commented on 2023-04-03
delock enable IMPI launcher
16e4504c
delock Merge branch 'up-master' into gma/cpu_support
f5fd3129
delock remove unused file and fix formatting
c85cbe4d
delock
delock commented on 2023-04-04
delock
delock commented on 2023-04-04
delock
delock commented on 2023-04-04
delock
delock commented on 2023-04-04
delock
delock commented on 2023-04-04
delock
delock commented on 2023-04-04
delock Merge branch 'up-master' into gma/cpu_support
bc0b1522
delock clean up ccl.cpp
71ad82e5
delock Merge branch 'master' into gma/cpu_support
5e816921
delock Merge branch 'master' into gma/cpu_support
fc7a4fa5
delock Less confusing error message when certin op builder are not implemented
f04d83d3
delock Fix license header
fa83d5a6
delock Add license header
5b18bbc1
delock add license headers
900c0074
delock add license header
57d790d2
delock fix cuda specific code in test
d001568e
delock update CPU workflow
07ce43bc
delock Merge branch 'up-master' into gma/cpu_support
e21b0730
delock
delock commented on 2023-04-13
delock use numactl to bind to core
cb6d6f5a
delock allow bind_cores_to_rank in multi-node impi runner
7a8a8efd
delock fix format error
c63bf454
delock Remove InferenceBuilder
8f5e51af
delock fix format error in numa.py
7fd07384
delock check whether op is in installed ops in ds_report.py
e21a04e7
delock Merge branch 'master' into gma/cpu_support
cd45a4f1
tjruwase Merge branch 'master' into gma/cpu_support
a3c5da49
delock
delock commented on 2023-04-19
delock allow override accelerator with DS_ACCELERATOR='cuda','cpu' or 'xpu'
9d0a47a2
delock lazy init class_dict in CUDA_Accelerator to avoid cyclic initializati…
72dcfc6d
loadams Merge branch 'master' into gma/cpu_support
a71a4620
delock put short path in the beginning in real_accelerator.py
1e0aeaab
delock
delock commented on 2023-04-21
delock device_count return number of NUMA nodes
880c466b
delock Merge branch 'up-master' into gma/cpu_support
47da0791
delock fix typo
69ee9d28
delock install numactl in cpu workflow
cd8d810f
delock Merge branch 'up-master' into gma/cpu_support
9d1eec2d
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
jeffra
jeffra commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase
tjruwase commented on 2023-04-25
tjruwase Merge branch 'master' into gma/cpu_support
0b0eadb6
tjruwase
delock
delock Follow comments
a0ebaad7
delock
delock Merge branch 'master' into gma/cpu_support
3c0ece38
delock Better implementation of device_count() and current_device()
248c47be
delock remove dependency_link for Intel Extension for DeepSpeed
02ce81f2
delock use check is_synchronized_device in timer only once
9babf032
delock remove env mapping WA in cpu_accelerator
59dca3f3
delock Merge branch 'master' into gma/cpu_support
0230c77f
delock
delock Merge branch 'up-master' into gma/cpu_support
04d4a27a
delock fix duplicate definition
7aecab72
delock fix format error
5cc61fbf
delock
delock commented on 2023-05-04
tjruwase
tjruwase Merge branch 'master' into gma/cpu_support
1bd6458c
tjruwase
tjruwase commented on 2023-05-04
delock
delock refine ccl backend selection
4027838e
delock move comments to the right place
e4e92e0a
tjruwase
tjruwase commented on 2023-05-05
tjruwase
delock
delock Merge branch 'up-master' into gma/cpu_support
31e17853
delock remove prefer_deepspeed_comm, use CCLBackend by default
50faa03f
delock refractor fallback path
5d26c8e9
delock Fix execution failure in kernel injection path
5957f56e
delock do not refractory kernel injection fallback path in residual_add bec…
f7dd9402
delock guard residual_add fallback path with environ DS_KI_FALLBACK=True
816e3548
delock Merge branch 'master' into gma/cpu_support
803b17f7
tjruwase
tjruwase Merge branch 'master' into gma/cpu_support
2b59496c
delock fix format error
764d027d
delock add test for allreduce on CPU workflow
869724c4
delock
delock fix format error
f678b901
tjruwase Merge branch 'master' into gma/cpu_support
828dce07
tjruwase
tjruwase approved these changes on 2023-05-10
tjruwase
delock
delock Merge branch 'up-master' into gma/cpu_support
8c3444b2
delock Fallback to TorchBackend if CCLBackend kernel are not implemented
1b9cd906
delock Merge branch 'master' into gma/cpu_support
bf5657f8
delock Update Intel Extension for Pytorch installation link
c6c6eb2c
delock Don't specify version number of Intel Extension for PyTorch
e7fba930
delock install oneCCL for CCLBackend
b2c7cb7f
delock fix link path for CPU comm kernels
87493669
delock fix source oneCCL environment
782a624b
delock source oneCCL env before run UT
06f6815d
tjruwase Merge branch 'master' into gma/cpu_support
24a5a526
delock Give more specific instruction when CCL_ROOT not defined
f63cbbda
delock
delock Merge branch 'master' into gma/cpu_support
06b55a03
tjruwase Merge branch 'master' into gma/cpu_support
1fc50be4
tjruwase Merge branch 'master' into gma/cpu_support
5f496860
tjruwase tjruwase merged 1f72082f into master 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone