PR #321 Generation server using HF accelerate and DS inference

fix timing (#31)

19cde92b

Update gpt2_tokenization.py

a0bccfee

Revert "Update gpt2_tokenization.py"

ac227a19

use pp engine even for pp=1 (#6) (#34)

2a7ee91c

Revert "use pp engine even for pp=1 (#6) (#34)"

e7bc518b

Revert "Revert "use pp engine even for pp=1 (#6) (#34)""

2a64b196

Create README.md

c3399a72

Faster preprocessing (#18)

3caf203a

add a section on how we use deepspeed with Meg

a7856ca4

fix the deepspeed example

03665872

add .bs to the version to help check we are on the right repo/branch

b3aa039a

fix attn_mask (#50)

e7ac5fd0

chore: update gitignore (#45)

358eac68

Group tensorboard metrics (#39)

b6a2c9e3

rm `(s)` that slipped through

e0c62368

Update requirements.txt (#46)

72f40805

Add LRU cache, add faster tokenization (#37)

84f8d510

Update README.md (#51)

5521f383

chore: add deepspeed as comment

c43e2075

Fix pretrain_gpt_single_node example script to have only one occurenc…

6e5f752e

better comment on TB writer (`is_last_rank`)

190565df

Add GLU variants (#47)

50cb9dac

[microsoft/Megatron-DeepSpeed sync] Commits including 2021-08-09 (#58)

febe21de

use HuggingFace Datasets as source to build Megatron data files (#48)

b11b2be4

Add test suite (#64)

3c6460d3

fix arg help (#65)

29f01503

add testing and contribute info

128013d5

fix header

60e82e3e

fix: doc_idx offset when merging indexed dataset files (#66)

ccab4056

shuffle index list with numpy, scatter list, use file for large lists…

3343c777

fix: exclusive scan computing pointers list (#68)

8f754d3c

- Recompute bin/idx using microsoft/Megatron-DeepSpeed (Not changes)

3ab2b3dc

Add openwebtext1000.jsonl to .gitignore

34d10765

[testing] fixes for pt-1.10 (#71)

76429033

Expose GLU activations as arguments (#69)

07a2ba56

fix circular import (#72)

20131063

[codecarbon] integration (#15)

ed786b50

Check cardon directory is not None (#74)

6a48314b

[CI] start workflow (#75)

2f27e558

[CI] wip (#76)

2595bd99

distributed merge of per-rank Megatron data files (#55)

aa319b13

fix test; skip broken test (#79)

937135a5

Add step to download dataset before running the preprocess_data_dist …

6d656ec6

dynamically use as many 3d dimensions as possible (#83)

8cb2d18a

add missing dependencies (#88)

d7556478

[CI] setting up a CI with EC2 backend (#78)

41fdd464

[requirements] fix format (#94)

6be85cae

[WIP] [codecarbon] sorting out CC warnings + logger preamble (#80)

c7d8e947

Floating-point ops counting and reloading (#40)

30e0b8d6

added comment

b8b47973

check whether python3-config is available (#98)

5162fd24

Prefix lm (#52)

55e7332c

simplify the CI trigger (#102)

e28f84ce

Fix model tests (#103)

f822ef05

[tensor comparisons] support pt-1.8, add torch_assert_close (#106)

709f1afb

Checkpoint conversion tools (#14) (#109)

39c3d708

add direct meg-ds to hf format script (#110)

7dd3a6bd

add direct meg-ds to hf format script (part2) (#111)

846cc324

training with dummy data to verify sampling (#36)

2495bd84

update merge_preprocessed_data to use distributed merge (#82)

d168c1b9

make scripts executable

74b8166a

add shebang

1ac6a70e

ALiBi Implementation (#101)

87f05985

[tests] flush std streams (#120)

8eb00297

chore: update `.gitignore`

0ec02575

[Feature] Implement sample-ids-to-text extractor (#116)

a0e6b68a

[testing] ensure no lock file is dropped (#122)

202fd3ef

Save tokenizer in conversion script (#128)

c146dced

fix: only trigger ci on .py file changes (#131)

3586830c

Curriculum learning support (#132)

a319a6ca

[CL] fix default placement (#133)

97bdf317

Fix deepspeed prefix-lm (#107)

63539b11

[codecarbon] switch to master (#135)

5f3c08bb

run on pull_request branch (#141)

bbe4dea5

print number of params only on rank 0 (#140)

34140e76

Configure code style formatters (#130)

04c6da3e

[Feature] Porting bitsandbytes to meg-deepspeed (#144)

0f7a2bc7

backward compatibility for new chkpt keys (#147)

c1b09d4a

fused softmax layer bug fix sync (#151)

ce20a7d6

Fix curriculum learning support (#134)

959a876f

disable codecarbon as it's very unstable (#152)

3e761953

Fix glu activation (#148)

7813714c

[Logging] Improve logging mechanism (#154)

4fc9ab5f

Bump minimum version for torch (#156)

cbbfd7a2

[tests] fix requirements (#158)

85e3c1fb

don't save latest_checkpointed_iteration.txt w/ deepspeed (#159)

1821201f

[testing] fix bnb test skipping (#160)

6a9d73b0

remove useless log line (#161)

087a7e1e

Fix curriculum learning doc (#162)

a55c0071

[checkpoint] only one latest file (#164)

224d7c14

[CI] fix ci / update packages (#170)

10b4d42b

Update main.yml (#172)

0f725015

Fix prefix lm offsets (#167)

7364280d

Adding language specific validation sets for Multilingual model train…

54bb7a3e

Fixed merge oversight in tensorboard logs

ed812bdc

simplifying tests

afb3778b

Fixed TP > 1 issue with new validation scheme

2a967d58

Alternative fix to TP > 1 (#178)

04e28560

[CI] improvements (#185)

11a2a369

[PrefixLM] Figuring out why prefix lm is doing poorly on short contex…

5b34a6b9

[BNB] integrate `StableEmbeding` into `VocabParallelEmbedding` logic …

0635ea2c

Full seqlen eval for CL+PP (#187)

1f678bc1

Support skip iteration flag (#177)

0f425a23

add layernorm in Embedding (#191)

c73e784d

removed regular package for megatron model (#192)

d13cbeba

Add eval-only arg (#188)

b1246143

Delete unnecessary brackets (#197)

20e9afc5

[CI] fix which tests get run (#199)

28dd4e75

add missing space (#200)

7dd85a62

elastic launcher compatible init_process_group (#201)

767eccbf

[WIP] dealing with multi-process noise (#193)

d443ec71

param size printing revamp (#202)

d1713bed

[test] `--partition-activations` (#184)

f2a54020

chore: add tmp directory to `.gitignore` (#205)

a100a757

Reweighting strat for prefix lm (#190)

a3904851

Checking we use fused kernels to compute scaled masked softmax on pre…

cd06bafa

Revert "Checking we use fused kernels to compute scaled masked softma…

cafe8cc8

Fix consumed_valid_samples counting for several valid dataloaders

8e928e99

[TB] add throughput graphs (#210)

8532df61

replay layer_norm_cuda_kernel.cu fixes (#216)

30436f9c

improve build (#207)

94421bfa

[logging] synced print (#217)

96dbac8c

fix tflops calculation (#223)

3cbdb38b

tflops for CL (#224)

5ccf8b68

Revert "tflops for CL (#224)" (#225)

b7f8a628

Fix alibi (#222)

2b26dcaf

save args to txt file (#218)

90138b19

implement missing --no-load-optim support for deepspeed path (#231)

f36a0ffc

fix tests (#232)

a1b688ee

[ds report] less noise (#215)

9ad0d973

enable new_style for add_scalar function for faster data format (#237)

8c2e1da1

fix add_scalar for pt<1.9 (#240)

e4fc19c2

Fix throughput unit (#241)

f0a57f0d

[TB] log restarts (#234)

5812e4e3

Alibi Tensor Parallel Fix (#244)

d0a047df

implement kill switch (#245)

c3e4230f

--abort-on-unmet-fused-kernel-constraints (#247)

0cbd3990

[apex FusedAdam] crash workaround (#249)

24d72c6f

Replace approximate formula with exact one for throughput (#251)

14e1e3b9

Fix preprocess_data_many_cores to use dtype

ee49e636

Fix preprocess_data_many_cores to use dtype

77fcc4e2

Use padded vocab size in preprocessing scripts (#253)

a5b28a7d

Try to read the data path arguments directly from a file (#254)

2d10187a

[sync] bf16 (#250)

3b227b8a

make partition_method configurable (#256)

65e96a2a

add `pad-vocab-size-to` argument and tests (#255)

8f5a5179

deploy elastic error handler (#258)

4b3a4474

sync the whole Meg-LM fused_kernels sub-tree (#260)

e6598b72

allocate embed norm only on pp0 (#261)

7a67ce28

switch to MixedFusedLayerNorm (#262)

59f9a3f7

preprocessing from arrow file to load an HF dataset

decebdc2

Sorry, last change was meant to a PR. This reverts commit d0fcf4170de…

eab76a6e

[kill switch] correct sys.exit (#266)

543992e3

disable samples-per-dataset, steps-per-dataset, tokens-per-dataset (#…

f2e1f03a

[kill switch] fix test (#268)

43aea861

[tensorboard] add rename and remove event tools (#269)

315e21f0

`torch.testing.assert_equal` didn't make it (#273)

d8ba7a20

add stop alarm instructions

c16a81e6

add start-fast doc (#278)

cf81e3dd

tweak the doc

9214b77b

Create CODEOWNERS

fcdb5273

Update CODEOWNERS

d2363766

Update CODEOWNERS

fa8e9b97

Update CODEOWNERS

20a02019

Fix mixed fused layer norm to mimick nn.LayerNorm for torch>1.11 (#281)

4e16e4a4

[valid] deadlock workaround (#282)

56333df5

Fix tflops glu computation (#283)

4cf7e648

Fix DS init (#285)

dd53f9d4

Mlm adaptation (#287)

c384a450

Fixed MLM dataset arguments(#290)

0f294062

Eval harness (#212)

7cf64694

Merge MLM too fast 2 (#294)

3d260477

MTF dataset and packing (#293)

5d051532

CI fixes (#302)

d59ed79d

sync layer norms (#272)

22a31f04

MTF train script (#295)

3a5b327f

Add support for weighted train (#299)

464b45f0

Combine Specs (#304)

70e221cd

Add bias a weight we need to sync as well (#307)

75a2c911

Fix causal attention mask (#306)

5302902a

Create README.md

cf56a8b2

not yet working script

d677a6df

hardcode the dtype depending on the model

fcd61be6

change the mp based on the world_size

3aa84d6b

remove hardcoded world_size

3b583439

add bigscience/bigscience-small-testing

b694a4f4

Merge branch 'bloom-inference' of https://github.com/bigscience-works…

f2ed3a13

fixes

0079ff74

add zero-inference script

3dfc0898

fixes

bf447807

fix

afe90272

working script

8b9fe595

renames

09544e4a

fixes

f2a5520c

fix for offline use

0394c070

add benchmark

0d8a99d8

add benchmark

c5d82a98

update

dd78ac3f

cleanup

f3548a27

update

3c0cc4e6

msecs

d7cbbe12

cleanup

c1bac358

improve

be61e59e

fix benchmark, add warmup

62b141fc

update

541590ec

fix; thanks Michael Wyatt

aad51bc5

clarify

fd10718d

Merge branch 'bloom-inference' of https://github.com/bigscience-works…

32dd5ca8

add bloom batch-inference script

1127770e

removed the names :-)

595d746c

fold the bs functionality from the other script

ac9b849c

fix

d7251534

restore do_sample

b1458589

dump generate args

bd8971fb

fix

4ceaa570

fix

5eee60cb

support any batchsize

70ed13d5

div by bs

256860df

mul by bs

5fdd563e

add cpu_offload; sync scripts

fb5c95c7

wip

fdb42c2a

improvements

d7e661e6

fixes

71b86751

fixes

781993b8

add accelerate script

6fa61299

fix

29b6b302

wip

cc69be66

wip

e447bab5

stats

c2175e47

add OnDevice and remove zero-inference (#316)

ce0c9758

wip

f12a3d0a

rework generate + benchmark

ac3d7cb3

figure out the memory map dynamically

fd28fc0e

bug fix

0e6562e8

fix ds-zero-inference wrt device

7768629e

bug fix

97e6b53e

update

092a1fde

update

118b4ab6

mayank31398 changed the title ~~Generation server [DO NOT MERGE, Work in Progress]~~ Generation server using HF accelerate and DS inference 3 years ago

mayank31398 changed the base branch from main to bloom-inference 3 years ago

mayank31398 changed the base branch from bloom-inference to main 3 years ago

add server scripts

49fe618f

mayank31398 changed the base branch from main to bloom-inference 3 years ago

fix bug

b83b39d0

new code

031c0ee7

working code

450eb1fc

fix bug

fc5d3839

update readme

39bab5c1

mayank31398 marked this pull request as ready for review 3 years ago

increase batch size for HF accelerate

303a6cfd

increase batch size

69d5cf29

support dynamic batch size with deepspeed

d816f593

drop num tokens

73a79d2b

drop return type

89dfe231

oom

03abce7e

mayank31398 closed this 3 years ago

mayank31398 deleted the generation-server branch 3 years ago

Megatron-DeepSpeed
Generation server using HF accelerate and DS inference
#321

Closed

Generation server using HF accelerate and DS inference #321

Megatron-DeepSpeed Generation server using HF accelerate and DS inference #321 Closed

Generation server using HF accelerate and DS inference #321

Megatron-DeepSpeed
Generation server using HF accelerate and DS inference
#321

Closed