Megatron-DeepSpeed
Generation server using HF accelerate and DS inference
#321
Closed

Generation server using HF accelerate and DS inference #321

mayank31398 wants to merge 1165 commits into bigscience-workshop:bloom-inference from generation-server
mayank31398
stas00 fix timing (#31)
19cde92b
huu4ontocord Update gpt2_tokenization.py
a0bccfee
Revert "Update gpt2_tokenization.py"
ac227a19
stas00 use pp engine even for pp=1 (#6) (#34)
2a7ee91c
stas00 Revert "use pp engine even for pp=1 (#6) (#34)"
e7bc518b
stas00 Revert "Revert "use pp engine even for pp=1 (#6) (#34)""
2a64b196
stas00 Create README.md
c3399a72
thomasw21 Faster preprocessing (#18)
3caf203a
stas00 add a section on how we use deepspeed with Meg
a7856ca4
stas00 fix the deepspeed example
03665872
stas00 add .bs to the version to help check we are on the right repo/branch
b3aa039a
stas00 fix attn_mask (#50)
e7ac5fd0
jaketae chore: update gitignore (#45)
358eac68
VictorSanh Group tensorboard metrics (#39)
b6a2c9e3
VictorSanh rm `(s)` that slipped through
e0c62368
jaketae Update requirements.txt (#46)
72f40805
huu4ontocord Add LRU cache, add faster tokenization (#37)
84f8d510
lintangsutawika Update README.md (#51)
5521f383
jaketae chore: add deepspeed as comment
c43e2075
thomasw21 Fix pretrain_gpt_single_node example script to have only one occurenc…
6e5f752e
VictorSanh better comment on TB writer (`is_last_rank`)
190565df
jaketae Add GLU variants (#47)
50cb9dac
stas00 [microsoft/Megatron-DeepSpeed sync] Commits including 2021-08-09 (#58)
febe21de
adammoody use HuggingFace Datasets as source to build Megatron data files (#48)
b11b2be4
stas00 Add test suite (#64)
3c6460d3
stas00 fix arg help (#65)
29f01503
stas00 add testing and contribute info
128013d5
stas00 fix header
60e82e3e
adammoody fix: doc_idx offset when merging indexed dataset files (#66)
ccab4056
adammoody shuffle index list with numpy, scatter list, use file for large lists…
3343c777
adammoody fix: exclusive scan computing pointers list (#68)
8f754d3c
thomasw21 - Recompute bin/idx using microsoft/Megatron-DeepSpeed (Not changes)
3ab2b3dc
thomasw21 Add openwebtext1000.jsonl to .gitignore
34d10765
stas00 [testing] fixes for pt-1.10 (#71)
76429033
jaketae Expose GLU activations as arguments (#69)
07a2ba56
stas00 fix circular import (#72)
20131063
stas00 [codecarbon] integration (#15)
ed786b50
thomasw21 Check cardon directory is not None (#74)
6a48314b
stas00 [CI] start workflow (#75)
2f27e558
stas00 [CI] wip (#76)
2595bd99
adammoody distributed merge of per-rank Megatron data files (#55)
aa319b13
stas00 fix test; skip broken test (#79)
937135a5
adammoody Add step to download dataset before running the preprocess_data_dist …
6d656ec6
stas00 dynamically use as many 3d dimensions as possible (#83)
8cb2d18a
stas00 add missing dependencies (#88)
d7556478
stas00 [CI] setting up a CI with EC2 backend (#78)
41fdd464
stas00 [requirements] fix format (#94)
6be85cae
stas00 [WIP] [codecarbon] sorting out CC warnings + logger preamble (#80)
c7d8e947
TevenLeScao Floating-point ops counting and reloading (#40)
30e0b8d6
TevenLeScao added comment
b8b47973
stas00 check whether python3-config is available (#98)
5162fd24
thomasw21 Prefix lm (#52)
55e7332c
stas00 simplify the CI trigger (#102)
e28f84ce
thomasw21 Fix model tests (#103)
f822ef05
stas00 [tensor comparisons] support pt-1.8, add torch_assert_close (#106)
709f1afb
stas00 Checkpoint conversion tools (#14) (#109)
39c3d708
stas00 add direct meg-ds to hf format script (#110)
7dd3a6bd
stas00 add direct meg-ds to hf format script (part2) (#111)
846cc324
lintangsutawika training with dummy data to verify sampling (#36)
2495bd84
adammoody update merge_preprocessed_data to use distributed merge (#82)
d168c1b9
stas00 make scripts executable
74b8166a
stas00 add shebang
1ac6a70e
ofirpress ALiBi Implementation (#101)
87f05985
stas00 [tests] flush std streams (#120)
8eb00297
jaketae chore: update `.gitignore`
0ec02575
wade3han [Feature] Implement sample-ids-to-text extractor (#116)
a0e6b68a
stas00 [testing] ensure no lock file is dropped (#122)
202fd3ef
jaketae Save tokenizer in conversion script (#128)
c146dced
jaketae fix: only trigger ci on .py file changes (#131)
3586830c
conglongli Curriculum learning support (#132)
a319a6ca
stas00 [CL] fix default placement (#133)
97bdf317
thomasw21 Fix deepspeed prefix-lm (#107)
63539b11
stas00 [codecarbon] switch to master (#135)
5f3c08bb
stas00 run on pull_request branch (#141)
bbe4dea5
stas00 print number of params only on rank 0 (#140)
34140e76
jaketae Configure code style formatters (#130)
04c6da3e
wade3han [Feature] Porting bitsandbytes to meg-deepspeed (#144)
0f7a2bc7
stas00 backward compatibility for new chkpt keys (#147)
c1b09d4a
stas00 fused softmax layer bug fix sync (#151)
ce20a7d6
conglongli Fix curriculum learning support (#134)
959a876f
stas00 disable codecarbon as it's very unstable (#152)
3e761953
thomasw21 Fix glu activation (#148)
7813714c
thomasw21 [Logging] Improve logging mechanism (#154)
4fc9ab5f
thomasw21 Bump minimum version for torch (#156)
cbbfd7a2
stas00 [tests] fix requirements (#158)
85e3c1fb
stas00 don't save latest_checkpointed_iteration.txt w/ deepspeed (#159)
1821201f
stas00 [testing] fix bnb test skipping (#160)
6a9d73b0
stas00 remove useless log line (#161)
087a7e1e
conglongli Fix curriculum learning doc (#162)
a55c0071
stas00 [checkpoint] only one latest file (#164)
224d7c14
stas00 [CI] fix ci / update packages (#170)
10b4d42b
stas00 Update main.yml (#172)
0f725015
thomasw21 Fix prefix lm offsets (#167)
7364280d
hadyelsahar Adding language specific validation sets for Multilingual model train…
54bb7a3e
TevenLeScao Fixed merge oversight in tensorboard logs
ed812bdc
TevenLeScao simplifying tests
afb3778b
TevenLeScao Fixed TP > 1 issue with new validation scheme
2a967d58
thomasw21 Alternative fix to TP > 1 (#178)
04e28560
stas00 [CI] improvements (#185)
11a2a369
thomasw21 [PrefixLM] Figuring out why prefix lm is doing poorly on short contex…
5b34a6b9
stas00 [BNB] integrate `StableEmbeding` into `VocabParallelEmbedding` logic …
0635ea2c
conglongli Full seqlen eval for CL+PP (#187)
1f678bc1
jaketae Support skip iteration flag (#177)
0f425a23
stas00 add layernorm in Embedding (#191)
c73e784d
stas00 removed regular package for megatron model (#192)
d13cbeba
SaulLu Add eval-only arg (#188)
b1246143
SaulLu Delete unnecessary brackets (#197)
20e9afc5
stas00 [CI] fix which tests get run (#199)
28dd4e75
SaulLu add missing space (#200)
7dd85a62
stas00 elastic launcher compatible init_process_group (#201)
767eccbf
stas00 [WIP] dealing with multi-process noise (#193)
d443ec71
stas00 param size printing revamp (#202)
d1713bed
stas00 [test] `--partition-activations` (#184)
f2a54020
jaketae chore: add tmp directory to `.gitignore` (#205)
a100a757
thomasw21 Reweighting strat for prefix lm (#190)
a3904851
thomasw21 Checking we use fused kernels to compute scaled masked softmax on pre…
cd06bafa
thomasw21 Revert "Checking we use fused kernels to compute scaled masked softma…
cafe8cc8
TevenLeScao Fix consumed_valid_samples counting for several valid dataloaders
8e928e99
stas00 [TB] add throughput graphs (#210)
8532df61
stas00 replay layer_norm_cuda_kernel.cu fixes (#216)
30436f9c
stas00 improve build (#207)
94421bfa
stas00 [logging] synced print (#217)
96dbac8c
stas00 fix tflops calculation (#223)
3cbdb38b
stas00 tflops for CL (#224)
5ccf8b68
stas00 Revert "tflops for CL (#224)" (#225)
b7f8a628
thomasw21 Fix alibi (#222)
2b26dcaf
bhavitvyamalik save args to txt file (#218)
90138b19
stas00 implement missing --no-load-optim support for deepspeed path (#231)
f36a0ffc
stas00 fix tests (#232)
a1b688ee
stas00 [ds report] less noise (#215)
9ad0d973
abodacs enable new_style for add_scalar function for faster data format (#237)
8c2e1da1
stas00 fix add_scalar for pt<1.9 (#240)
e4fc19c2
janEbert Fix throughput unit (#241)
f0a57f0d
stas00 [TB] log restarts (#234)
5812e4e3
DanielHesslow Alibi Tensor Parallel Fix (#244)
d0a047df
stas00 implement kill switch (#245)
c3e4230f
stas00 --abort-on-unmet-fused-kernel-constraints (#247)
0cbd3990
stas00 [apex FusedAdam] crash workaround (#249)
24d72c6f
deepakn94 Replace approximate formula with exact one for throughput (#251)
14e1e3b9
thomasw21 Fix preprocess_data_many_cores to use dtype
ee49e636
thomasw21 Fix preprocess_data_many_cores to use dtype
77fcc4e2
thomasw21 Use padded vocab size in preprocessing scripts (#253)
a5b28a7d
thomasw21 Try to read the data path arguments directly from a file (#254)
2d10187a
stas00 [sync] bf16 (#250)
3b227b8a
stas00 make partition_method configurable (#256)
65e96a2a
SaulLu add `pad-vocab-size-to` argument and tests (#255)
8f5a5179
stas00 deploy elastic error handler (#258)
4b3a4474
stas00 sync the whole Meg-LM fused_kernels sub-tree (#260)
e6598b72
stas00 allocate embed norm only on pp0 (#261)
7a67ce28
stas00 switch to MixedFusedLayerNorm (#262)
59f9a3f7
TevenLeScao preprocessing from arrow file to load an HF dataset
decebdc2
TevenLeScao Sorry, last change was meant to a PR. This reverts commit d0fcf4170de…
eab76a6e
stas00 [kill switch] correct sys.exit (#266)
543992e3
stas00 disable samples-per-dataset, steps-per-dataset, tokens-per-dataset (#…
f2e1f03a
stas00 [kill switch] fix test (#268)
43aea861
stas00 [tensorboard] add rename and remove event tools (#269)
315e21f0
stas00 `torch.testing.assert_equal` didn't make it (#273)
d8ba7a20
stas00 add stop alarm instructions
c16a81e6
stas00 add start-fast doc (#278)
cf81e3dd
stas00 tweak the doc
9214b77b
TevenLeScao Create CODEOWNERS
fcdb5273
TevenLeScao Update CODEOWNERS
d2363766
TevenLeScao Update CODEOWNERS
fa8e9b97
TevenLeScao Update CODEOWNERS
20a02019
thomasw21 Fix mixed fused layer norm to mimick nn.LayerNorm for torch>1.11 (#281)
4e16e4a4
stas00 [valid] deadlock workaround (#282)
56333df5
Muennighoff Fix tflops glu computation (#283)
4cf7e648
Quentin-Anthony Fix DS init (#285)
dd53f9d4
Mlm adaptation (#287)
c384a450
thomasw21 Fixed MLM dataset arguments(#290)
0f294062
DanielHesslow Eval harness (#212)
7cf64694
thomasw21 Merge MLM too fast 2 (#294)
3d260477
thomasw21 MTF dataset and packing (#293)
5d051532
stas00 CI fixes (#302)
d59ed79d
stas00 sync layer norms (#272)
22a31f04
thomasw21 MTF train script (#295)
3a5b327f
thomasw21 Add support for weighted train (#299)
464b45f0
Muennighoff Combine Specs (#304)
70e221cd
thomasw21 Add bias a weight we need to sync as well (#307)
75a2c911
thomasw21 Fix causal attention mask (#306)
5302902a
stas00 Create README.md
cf56a8b2
stas00 not yet working script
d677a6df
stas00 hardcode the dtype depending on the model
fcd61be6
change the mp based on the world_size
3aa84d6b
stas00 remove hardcoded world_size
3b583439
stas00 add bigscience/bigscience-small-testing
b694a4f4
Merge branch 'bloom-inference' of https://github.com/bigscience-works…
f2ed3a13
stas00 fixes
0079ff74
stas00 add zero-inference script
3dfc0898
stas00 fixes
bf447807
stas00 fix
afe90272
stas00 working script
8b9fe595
stas00 renames
09544e4a
stas00 fixes
f2a5520c
stas00 fix for offline use
0394c070
stas00 add benchmark
0d8a99d8
stas00 add benchmark
c5d82a98
stas00 update
dd78ac3f
stas00 cleanup
f3548a27
stas00 update
3c0cc4e6
stas00 msecs
d7cbbe12
stas00 cleanup
c1bac358
stas00 improve
be61e59e
stas00 fix benchmark, add warmup
62b141fc
stas00 update
541590ec
stas00 fix; thanks Michael Wyatt
aad51bc5
stas00 clarify
fd10718d
Merge branch 'bloom-inference' of https://github.com/bigscience-works…
32dd5ca8
add bloom batch-inference script
1127770e
removed the names :-)
595d746c
stas00 fold the bs functionality from the other script
ac9b849c
stas00 fix
d7251534
stas00 restore do_sample
b1458589
stas00 dump generate args
bd8971fb
stas00 fix
4ceaa570
stas00 fix
5eee60cb
stas00 support any batchsize
70ed13d5
stas00 div by bs
256860df
stas00 mul by bs
5fdd563e
stas00 add cpu_offload; sync scripts
fb5c95c7
stas00 wip
fdb42c2a
stas00 improvements
d7e661e6
stas00 fixes
71b86751
stas00 fixes
781993b8
stas00 add accelerate script
6fa61299
stas00 fix
29b6b302
stas00 wip
cc69be66
stas00 wip
e447bab5
stas00 stats
c2175e47
jeffra add OnDevice and remove zero-inference (#316)
ce0c9758
stas00 wip
f12a3d0a
stas00 rework generate + benchmark
ac3d7cb3
stas00 figure out the memory map dynamically
fd28fc0e
stas00 bug fix
0e6562e8
stas00 fix ds-zero-inference wrt device
7768629e
stas00 bug fix
97e6b53e
stas00 update
092a1fde
stas00 update
118b4ab6
mayank31398
mayank31398 mayank31398 changed the title Generation server [DO NOT MERGE, Work in Progress] Generation server using HF accelerate and DS inference 3 years ago
pai4451
mayank31398 mayank31398 changed the base branch from main to bloom-inference 3 years ago
mayank31398
mayank31398 mayank31398 changed the base branch from bloom-inference to main 3 years ago
add server scripts
49fe618f
mayank31398 mayank31398 changed the base branch from main to bloom-inference 3 years ago
mayank31398
mayank31398
fix bug
b83b39d0
pai4451
new code
031c0ee7
mayank31398
mayank31398
pai4451
working code
450eb1fc
fix bug
fc5d3839
update readme
39bab5c1
mayank31398
mayank31398 mayank31398 marked this pull request as ready for review 3 years ago
mayank31398
pai4451
mayank31398
increase batch size for HF accelerate
303a6cfd
increase batch size
69d5cf29
support dynamic batch size with deepspeed
d816f593
drop num tokens
73a79d2b
drop return type
89dfe231
oom
03abce7e
mayank31398
mayank31398 mayank31398 closed this 3 years ago
mayank31398 mayank31398 deleted the generation-server branch 3 years ago
mayank31398
stas00
mayank31398
mayank31398
stas00

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone