PR #3041 [CPU] Support Intel CPU inference

add fallback path for kernels used in megatron

edf1c128

temporary numactl WA for SPR 56core

9a89405d

adapt core allocation according to number of ranks

d1b8f137

add switch to turn on numactl

e31439ed

detect number of cores on the system

c5828f75

allow select a subset of the cores on the system to bind

6b9dcd2f

Merge branch 'up-master' into gma/numactl

4031a6e7

remove unneeded changes

893c18de

Merge branch 'up-master' into gma/bf16_kernel

ad712333

Merge branch 'master' into gma/numactl

3551850d

Merge branch 'gma/numactl' into gma/cpu_support

04d17e85

add ccl backend

1369eda2

change nccl to ccl

3c927c7e

remove unused code

e1eecd2b

add comm/ccl to ops

c0053991

initial ccl comm support

b7d455e6

first broadcast case passed

6f2a73ea

add CCL_Backend to DeepSpeed

2c012fdd

support comm timer for CPU

3435185e

support barrier for comm backend

92cc50ec

support specify master address from deepspeed command line

62c53f75

support pytorch 2.0

9ce6fce7

remove 'block' from api

f4e1d3cd

Tweak for debug

1e583faf

Remove unecessary directory

a363f019

Add bf16 kernel support for inference

bb29b1ae

Add temporary torch implement for cpu inference

5e071744

Add softmax ops cpu fallback for inference

076d6992

bind cores to numa domain as well

0a35d9c8

Merge branch 'up-master' into gma/cpu_support

c1d78b92

merge latest change in gma/numactl

620c81ca

initial bf16 kernel support with fallback path

1b138ba8

initial fallback path for bloom kernel injection

a3b663c6

Merge branch 'zhong_master' into gma/cpu_support

6ca0f0d9

fix softmax attn mask

4323a909

check KMP_AFFINITY to avoid conflict with numactl

cc6df65c

New CCLBackend which utilize TorchBackend for initialization

9e503a54

rollback last change because there is result error

9a38b28e

fix bloom injection policy TP could not work issue.

a2d9c9e6

Merge pull request #3 from sywangyi/yi_bloom_dev

705de700

Use TorchBackend to initialize CCLBackend, make behavior consistent

1f218132

remove comm under deepspeed/ops

df4b0d77

Merge branch 'microsoft:master' into gma/cpu_support

ad842191

add license header

7702ba7e

code clean up

e5ff34c4

fix format issue

adb1adfc

remove magic number in main address

8ad2dbca

add caching support but not turn on by default

ce5830e5

change name of inference_cuda_module to inference_module

da9053fb

Merge branch 'master' into gma/cpu_support

626a9bfc

Check for is_synchronized_device in accelerator before get Event

e79ae9fa

rogerxfeng8 commented on 2023-03-20

fix typo

9e41f21b

Fix fallback path of softmax kernel on CUDA device for BF16 data type…

c6755471

add cpu backend files

d42df022

change CPU_Accelerator op_builder_dir

9fca9bee

remove cpu_kernel_path

e797502b

using CPU_Accelerator on non-cuda device

1fddaf91

fix deepspeed.op_builder => deepspeed.ops.op_builder

43a7aaef

add alias for num_gpus: num_accelerators

ad4d39a3

Merge branch 'gma/cpu_support_add_backend' into gma/cpu_support

705c5196

allow loading cpu_builder in build stage

e685e98e

Assume cuda available if torch not installed

a4e76e7a

add oneccl_binding_pt to requirements

71a6f477

move oneccl-binding-pt to seperate requiremetns-cpu.txt

8faae833

add missing file

7997887a

use dependency_links in setuptools.setup() call for additional depend…

a027af6a

Merge branch 'master' into gma/cpu_support

4742e23d

install oneccl_bind_pt in workflows

62772ead

change oneccl_bind_pt's version from 1.13 to 2.0

a4499d65

use intel_exention_for_pytorch as indicator that CPU_Accelerator shou…

fe63c7b0

Add indicator for Accelerator used

39052647

change foo.c to foo.cpp

f7ead2d9

exclude 'cpu' directory in CUDA op builder reflection

d4e63b9e

Merge branch 'master' into gma/cpu_support

a34b8863

add a cpu-inference workflow

d0e5b1cb

run cpu-inference workflow on self-hosted instance

221896c2

change cpu runs-on node to v100 node

a039688b

print out python version in workflow

5525ce0f

add verbose in pip command to understand oneccl_bind_pt install issue

3ddc8b2d

update cpu-inference workflow

fa9d345e

add a stage to detect instance instruction sets

978314ac

add back bf16 support for CPU inference

c5b11ad0

enable autoTP for bloom

312d85c6

update workflow to detect cpu instruction sets

a66d8b56

temporary WA for Intel Extension for PyTorch AVX2 instructioon set de…

04478237

change cpu-inference workflow machine to ubuntu-20.04

a09c638d

add sharded checkpoint loading for AutoTP path to reduce the peak mem…

33803ee8

Merge pull request #6 from sywangyi/autoTP_reduce_peak

2deff633

enable policy for llama

755a47b2

Merge branch 'master' into gma/cpu_support

bd021b5e

use a special build ipex to test avx2 detection fix

21f9c286

Merge pull request #7 from jianan-gu/patch-1

c177c2e4

Merge branch 'up-master' into gma/cpu_support

4e4a367d

delock commented on 2023-03-28

fix format

db1f564b

Merge branch 'master' into gma/cpu_support

20c79e17

fix test fail issue

f66195ee

Merge pull request #8 from sywangyi/yi_dev

80908670

fix gptj sharded checkpoint loading problem

f65030b6

Merge pull request #9 from sywangyi/yi_dev_gptj_shard

08284cc7

Merge branch 'up-master' into gma/cpu_support

2c4f209f

return a not implemented build in get_op_builder in cpu_backend

f83a1303

Merge branch 'master' into gma/cpu_support

e59bf322

support cpu device in tests

302e5b0b

use cpuinfo to extract number of CPUs

27bcd042

use ~/tmp as transfomer cache rather than /blob/

aafa40eb

delock marked this pull request as ready for review 2 years ago

delock requested a review from

RezaYazdaniAminabadi 2 years ago

delock requested a review from

jeffra 2 years ago

delock requested a review from

mrwyattii 2 years ago

delock requested a review from

awan-10 2 years ago

delock requested a review from

cmikeh2 2 years ago

delock requested a review from

arashb 2 years ago

delock requested a review from

tjruwase 2 years ago

delock requested a review from

loadams 2 years ago

Add support for mpich launcher with prefer_deepspeed_comm

8e902702

add missing modification in accelerator

d70448c9

delock commented on 2023-04-03

enable IMPI launcher

16e4504c

Merge branch 'up-master' into gma/cpu_support

f5fd3129

remove unused file and fix formatting

c85cbe4d

delock commented on 2023-04-04

Merge branch 'up-master' into gma/cpu_support

bc0b1522

clean up ccl.cpp

71ad82e5

Merge branch 'master' into gma/cpu_support

5e816921

Merge branch 'master' into gma/cpu_support

fc7a4fa5

Less confusing error message when certin op builder are not implemented

f04d83d3

Fix license header

fa83d5a6

Add license header

5b18bbc1

add license headers

900c0074

add license header

57d790d2

fix cuda specific code in test

d001568e

update CPU workflow

07ce43bc

Merge branch 'up-master' into gma/cpu_support

e21b0730

delock commented on 2023-04-13

use numactl to bind to core

cb6d6f5a

allow bind_cores_to_rank in multi-node impi runner

7a8a8efd

fix format error

c63bf454

Remove InferenceBuilder

8f5e51af

fix format error in numa.py

7fd07384

check whether op is in installed ops in ds_report.py

e21a04e7

Merge branch 'master' into gma/cpu_support

cd45a4f1

Merge branch 'master' into gma/cpu_support

a3c5da49

delock commented on 2023-04-19

allow override accelerator with DS_ACCELERATOR='cuda','cpu' or 'xpu'

9d0a47a2

lazy init class_dict in CUDA_Accelerator to avoid cyclic initializati…

72dcfc6d

Merge branch 'master' into gma/cpu_support

a71a4620

put short path in the beginning in real_accelerator.py

1e0aeaab

delock commented on 2023-04-21

device_count return number of NUMA nodes

880c466b

Merge branch 'up-master' into gma/cpu_support

47da0791

fix typo

69ee9d28

install numactl in cpu workflow

cd8d810f

Merge branch 'up-master' into gma/cpu_support

9d1eec2d

tjruwase commented on 2023-04-25

jeffra commented on 2023-04-25

tjruwase commented on 2023-04-25

Merge branch 'master' into gma/cpu_support

0b0eadb6

Follow comments

a0ebaad7

Merge branch 'master' into gma/cpu_support

3c0ece38

Better implementation of device_count() and current_device()

248c47be

remove dependency_link for Intel Extension for DeepSpeed

02ce81f2

use check is_synchronized_device in timer only once

9babf032

remove env mapping WA in cpu_accelerator

59dca3f3

Merge branch 'master' into gma/cpu_support

0230c77f

Merge branch 'up-master' into gma/cpu_support

04d4a27a

fix duplicate definition

7aecab72

fix format error

5cc61fbf

delock commented on 2023-05-04

Merge branch 'master' into gma/cpu_support

1bd6458c

tjruwase commented on 2023-05-04

refine ccl backend selection

4027838e

move comments to the right place

e4e92e0a

tjruwase commented on 2023-05-05

Merge branch 'up-master' into gma/cpu_support

31e17853

remove prefer_deepspeed_comm, use CCLBackend by default

50faa03f

refractor fallback path

5d26c8e9

Fix execution failure in kernel injection path

5957f56e

do not refractory kernel injection fallback path in residual_add bec…

f7dd9402

guard residual_add fallback path with environ DS_KI_FALLBACK=True

816e3548

Merge branch 'master' into gma/cpu_support

803b17f7

Merge branch 'master' into gma/cpu_support

2b59496c

fix format error

764d027d

add test for allreduce on CPU workflow

869724c4

fix format error

f678b901

Merge branch 'master' into gma/cpu_support

828dce07

tjruwase approved these changes on 2023-05-10

Merge branch 'up-master' into gma/cpu_support

8c3444b2

Fallback to TorchBackend if CCLBackend kernel are not implemented

1b9cd906

Merge branch 'master' into gma/cpu_support

bf5657f8

Update Intel Extension for Pytorch installation link

c6c6eb2c

Don't specify version number of Intel Extension for PyTorch

e7fba930

install oneCCL for CCLBackend

b2c7cb7f

fix link path for CPU comm kernels

87493669

fix source oneCCL environment

782a624b

source oneCCL env before run UT

06f6815d

Merge branch 'master' into gma/cpu_support

24a5a526

Give more specific instruction when CCL_ROOT not defined

f63cbbda

Merge branch 'master' into gma/cpu_support

06b55a03

Merge branch 'master' into gma/cpu_support

1fc50be4

Merge branch 'master' into gma/cpu_support

5f496860

tjruwase merged 1f72082f into master 2 years ago

DeepSpeed
[CPU] Support Intel CPU inference
#3041

Merged

[CPU] Support Intel CPU inference #3041

DeepSpeed [CPU] Support Intel CPU inference #3041 Merged

[CPU] Support Intel CPU inference #3041

DeepSpeed
[CPU] Support Intel CPU inference
#3041

Merged