Megatron-DeepSpeed
Generation server using HF accelerate and DS inference
#321
Closed
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
1165
Changes
View On
GitHub
Generation server using HF accelerate and DS inference
#321
mayank31398
wants to merge 1165 commits into
bigscience-workshop:bloom-inference
from generation-server
fix timing (#31)
19cde92b
Update gpt2_tokenization.py
a0bccfee
Revert "Update gpt2_tokenization.py"
ac227a19
use pp engine even for pp=1 (#6) (#34)
2a7ee91c
Revert "use pp engine even for pp=1 (#6) (#34)"
e7bc518b
Revert "Revert "use pp engine even for pp=1 (#6) (#34)""
2a64b196
Create README.md
c3399a72
Faster preprocessing (#18)
3caf203a
add a section on how we use deepspeed with Meg
a7856ca4
fix the deepspeed example
03665872
add .bs to the version to help check we are on the right repo/branch
b3aa039a
fix attn_mask (#50)
e7ac5fd0
chore: update gitignore (#45)
358eac68
Group tensorboard metrics (#39)
b6a2c9e3
rm `(s)` that slipped through
e0c62368
Update requirements.txt (#46)
72f40805
Add LRU cache, add faster tokenization (#37)
84f8d510
Update README.md (#51)
5521f383
chore: add deepspeed as comment
c43e2075
Fix pretrain_gpt_single_node example script to have only one occurenc…
6e5f752e
better comment on TB writer (`is_last_rank`)
190565df
Add GLU variants (#47)
50cb9dac
[microsoft/Megatron-DeepSpeed sync] Commits including 2021-08-09 (#58)
febe21de
use HuggingFace Datasets as source to build Megatron data files (#48)
b11b2be4
Add test suite (#64)
3c6460d3
fix arg help (#65)
29f01503
add testing and contribute info
128013d5
fix header
60e82e3e
fix: doc_idx offset when merging indexed dataset files (#66)
ccab4056
shuffle index list with numpy, scatter list, use file for large lists…
3343c777
fix: exclusive scan computing pointers list (#68)
8f754d3c
- Recompute bin/idx using microsoft/Megatron-DeepSpeed (Not changes)
3ab2b3dc
Add openwebtext1000.jsonl to .gitignore
34d10765
[testing] fixes for pt-1.10 (#71)
76429033
Expose GLU activations as arguments (#69)
07a2ba56
fix circular import (#72)
20131063
[codecarbon] integration (#15)
ed786b50
Check cardon directory is not None (#74)
6a48314b
[CI] start workflow (#75)
2f27e558
[CI] wip (#76)
2595bd99
distributed merge of per-rank Megatron data files (#55)
aa319b13
fix test; skip broken test (#79)
937135a5
Add step to download dataset before running the preprocess_data_dist …
6d656ec6
dynamically use as many 3d dimensions as possible (#83)
8cb2d18a
add missing dependencies (#88)
d7556478
[CI] setting up a CI with EC2 backend (#78)
41fdd464
[requirements] fix format (#94)
6be85cae
[WIP] [codecarbon] sorting out CC warnings + logger preamble (#80)
c7d8e947
Floating-point ops counting and reloading (#40)
30e0b8d6
added comment
b8b47973
check whether python3-config is available (#98)
5162fd24
Prefix lm (#52)
55e7332c
simplify the CI trigger (#102)
e28f84ce
Fix model tests (#103)
f822ef05
[tensor comparisons] support pt-1.8, add torch_assert_close (#106)
709f1afb
Checkpoint conversion tools (#14) (#109)
39c3d708
add direct meg-ds to hf format script (#110)
7dd3a6bd
add direct meg-ds to hf format script (part2) (#111)
846cc324
training with dummy data to verify sampling (#36)
2495bd84
update merge_preprocessed_data to use distributed merge (#82)
d168c1b9
make scripts executable
74b8166a
add shebang
1ac6a70e
ALiBi Implementation (#101)
87f05985
[tests] flush std streams (#120)
8eb00297
chore: update `.gitignore`
0ec02575
[Feature] Implement sample-ids-to-text extractor (#116)
a0e6b68a
[testing] ensure no lock file is dropped (#122)
202fd3ef
Save tokenizer in conversion script (#128)
c146dced
fix: only trigger ci on .py file changes (#131)
3586830c
Curriculum learning support (#132)
a319a6ca
[CL] fix default placement (#133)
97bdf317
Fix deepspeed prefix-lm (#107)
63539b11
[codecarbon] switch to master (#135)
5f3c08bb
run on pull_request branch (#141)
bbe4dea5
print number of params only on rank 0 (#140)
34140e76
Configure code style formatters (#130)
04c6da3e
[Feature] Porting bitsandbytes to meg-deepspeed (#144)
0f7a2bc7
backward compatibility for new chkpt keys (#147)
c1b09d4a
fused softmax layer bug fix sync (#151)
ce20a7d6
Fix curriculum learning support (#134)
959a876f
disable codecarbon as it's very unstable (#152)
3e761953
Fix glu activation (#148)
7813714c
[Logging] Improve logging mechanism (#154)
4fc9ab5f
Bump minimum version for torch (#156)
cbbfd7a2
[tests] fix requirements (#158)
85e3c1fb
don't save latest_checkpointed_iteration.txt w/ deepspeed (#159)
1821201f
[testing] fix bnb test skipping (#160)
6a9d73b0
remove useless log line (#161)
087a7e1e
Fix curriculum learning doc (#162)
a55c0071
[checkpoint] only one latest file (#164)
224d7c14
[CI] fix ci / update packages (#170)
10b4d42b
Update main.yml (#172)
0f725015
Fix prefix lm offsets (#167)
7364280d
Adding language specific validation sets for Multilingual model train…
54bb7a3e
Fixed merge oversight in tensorboard logs
ed812bdc
simplifying tests
afb3778b
Fixed TP > 1 issue with new validation scheme
2a967d58
Alternative fix to TP > 1 (#178)
04e28560
[CI] improvements (#185)
11a2a369
[PrefixLM] Figuring out why prefix lm is doing poorly on short contex…
5b34a6b9
[BNB] integrate `StableEmbeding` into `VocabParallelEmbedding` logic …
0635ea2c
Full seqlen eval for CL+PP (#187)
1f678bc1
Support skip iteration flag (#177)
0f425a23
add layernorm in Embedding (#191)
c73e784d
removed regular package for megatron model (#192)
d13cbeba
Add eval-only arg (#188)
b1246143
Delete unnecessary brackets (#197)
20e9afc5
[CI] fix which tests get run (#199)
28dd4e75
add missing space (#200)
7dd85a62
elastic launcher compatible init_process_group (#201)
767eccbf
[WIP] dealing with multi-process noise (#193)
d443ec71
param size printing revamp (#202)
d1713bed
[test] `--partition-activations` (#184)
f2a54020
chore: add tmp directory to `.gitignore` (#205)
a100a757
Reweighting strat for prefix lm (#190)
a3904851
Checking we use fused kernels to compute scaled masked softmax on pre…
cd06bafa
Revert "Checking we use fused kernels to compute scaled masked softma…
cafe8cc8
Fix consumed_valid_samples counting for several valid dataloaders
8e928e99
[TB] add throughput graphs (#210)
8532df61
replay layer_norm_cuda_kernel.cu fixes (#216)
30436f9c
improve build (#207)
94421bfa
[logging] synced print (#217)
96dbac8c
fix tflops calculation (#223)
3cbdb38b
tflops for CL (#224)
5ccf8b68
Revert "tflops for CL (#224)" (#225)
b7f8a628
Fix alibi (#222)
2b26dcaf
save args to txt file (#218)
90138b19
implement missing --no-load-optim support for deepspeed path (#231)
f36a0ffc
fix tests (#232)
a1b688ee
[ds report] less noise (#215)
9ad0d973
enable new_style for add_scalar function for faster data format (#237)
8c2e1da1
fix add_scalar for pt<1.9 (#240)
e4fc19c2
Fix throughput unit (#241)
f0a57f0d
[TB] log restarts (#234)
5812e4e3
Alibi Tensor Parallel Fix (#244)
d0a047df
implement kill switch (#245)
c3e4230f
--abort-on-unmet-fused-kernel-constraints (#247)
0cbd3990
[apex FusedAdam] crash workaround (#249)
24d72c6f
Replace approximate formula with exact one for throughput (#251)
14e1e3b9
Fix preprocess_data_many_cores to use dtype
ee49e636
Fix preprocess_data_many_cores to use dtype
77fcc4e2
Use padded vocab size in preprocessing scripts (#253)
a5b28a7d
Try to read the data path arguments directly from a file (#254)
2d10187a
[sync] bf16 (#250)
3b227b8a
make partition_method configurable (#256)
65e96a2a
add `pad-vocab-size-to` argument and tests (#255)
8f5a5179
deploy elastic error handler (#258)
4b3a4474
sync the whole Meg-LM fused_kernels sub-tree (#260)
e6598b72
allocate embed norm only on pp0 (#261)
7a67ce28
switch to MixedFusedLayerNorm (#262)
59f9a3f7
preprocessing from arrow file to load an HF dataset
decebdc2
Sorry, last change was meant to a PR. This reverts commit d0fcf4170de…
eab76a6e
[kill switch] correct sys.exit (#266)
543992e3
disable samples-per-dataset, steps-per-dataset, tokens-per-dataset (#…
f2e1f03a
[kill switch] fix test (#268)
43aea861
[tensorboard] add rename and remove event tools (#269)
315e21f0
`torch.testing.assert_equal` didn't make it (#273)
d8ba7a20
add stop alarm instructions
c16a81e6
add start-fast doc (#278)
cf81e3dd
tweak the doc
9214b77b
Create CODEOWNERS
fcdb5273
Update CODEOWNERS
d2363766
Update CODEOWNERS
fa8e9b97
Update CODEOWNERS
20a02019
Fix mixed fused layer norm to mimick nn.LayerNorm for torch>1.11 (#281)
4e16e4a4
[valid] deadlock workaround (#282)
56333df5
Fix tflops glu computation (#283)
4cf7e648
Fix DS init (#285)
dd53f9d4
Mlm adaptation (#287)
c384a450
Fixed MLM dataset arguments(#290)
0f294062
Eval harness (#212)
7cf64694
Merge MLM too fast 2 (#294)
3d260477
MTF dataset and packing (#293)
5d051532
CI fixes (#302)
d59ed79d
sync layer norms (#272)
22a31f04
MTF train script (#295)
3a5b327f
Add support for weighted train (#299)
464b45f0
Combine Specs (#304)
70e221cd
Add bias a weight we need to sync as well (#307)
75a2c911
Fix causal attention mask (#306)
5302902a
Create README.md
cf56a8b2
not yet working script
d677a6df
hardcode the dtype depending on the model
fcd61be6
change the mp based on the world_size
3aa84d6b
remove hardcoded world_size
3b583439
add bigscience/bigscience-small-testing
b694a4f4
Merge branch 'bloom-inference' of https://github.com/bigscience-works…
f2ed3a13
fixes
0079ff74
add zero-inference script
3dfc0898
fixes
bf447807
fix
afe90272
working script
8b9fe595
renames
09544e4a
fixes
f2a5520c
fix for offline use
0394c070
add benchmark
0d8a99d8
add benchmark
c5d82a98
update
dd78ac3f
cleanup
f3548a27
update
3c0cc4e6
msecs
d7cbbe12
cleanup
c1bac358
improve
be61e59e
fix benchmark, add warmup
62b141fc
update
541590ec
fix; thanks Michael Wyatt
aad51bc5
clarify
fd10718d
Merge branch 'bloom-inference' of https://github.com/bigscience-works…
32dd5ca8
add bloom batch-inference script
1127770e
removed the names :-)
595d746c
fold the bs functionality from the other script
ac9b849c
fix
d7251534
restore do_sample
b1458589
dump generate args
bd8971fb
fix
4ceaa570
fix
5eee60cb
support any batchsize
70ed13d5
div by bs
256860df
mul by bs
5fdd563e
add cpu_offload; sync scripts
fb5c95c7
wip
fdb42c2a
improvements
d7e661e6
fixes
71b86751
fixes
781993b8
add accelerate script
6fa61299
fix
29b6b302
wip
cc69be66
wip
e447bab5
stats
c2175e47
add OnDevice and remove zero-inference (#316)
ce0c9758
wip
f12a3d0a
rework generate + benchmark
ac3d7cb3
figure out the memory map dynamically
fd28fc0e
bug fix
0e6562e8
fix ds-zero-inference wrt device
7768629e
bug fix
97e6b53e
update
092a1fde
update
118b4ab6
mayank31398
changed the title
Generation server [DO NOT MERGE, Work in Progress]
Generation server using HF accelerate and DS inference
3 years ago
mayank31398
changed the base branch from
main
to
bloom-inference
3 years ago
mayank31398
changed the base branch from
bloom-inference
to
main
3 years ago
add server scripts
49fe618f
mayank31398
changed the base branch from
main
to
bloom-inference
3 years ago
fix bug
b83b39d0
new code
031c0ee7
working code
450eb1fc
fix bug
fc5d3839
update readme
39bab5c1
mayank31398
marked this pull request as ready for review
3 years ago
increase batch size for HF accelerate
303a6cfd
increase batch size
69d5cf29
support dynamic batch size with deepspeed
d816f593
drop num tokens
73a79d2b
drop return type
89dfe231
oom
03abce7e
mayank31398
closed this
3 years ago
mayank31398
deleted the generation-server branch
3 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub