Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
bigscience-workshop/Megatron-DeepSpeed
Pull Requests
Commits
Open
Closed
Checking we use fused kernels to compute scaled masked softmax on prefix lm
#213 opened 2021-11-29 13:24 by
thomasw21
Tweaks for lm-eval-harness
#208 opened 2021-11-26 04:47 by
zphang
Compute model param count once
#204 opened 2021-11-24 06:27 by
jaketae
[bnb] resume with more replicas test
#198 opened 2021-11-19 20:24 by
stas00
Add skip iterations to `sample_idxs_to_text.py`
#194 opened 2021-11-19 06:02 by
jaketae
[wip] debug with new data
#165 opened 2021-10-27 05:03 by
stas00
[debug] ModelInspector
#155 opened 2021-10-24 23:45 by
jaketae
[chkpt conversion] handle the case where tp=0 , should be 1
#146 opened 2021-10-20 15:41 by
stas00
adding scalenorm, attention_init_method and relu^2
#139 opened 2021-10-17 01:56 by
huu4ontocord
Test: Add checkpoint conversion test code
test
#121 opened 2021-09-30 01:23 by
jaketae
Add valid data
#113 opened 2021-09-22 01:07 by
sbmaruf
[WIP] [fp32 checkpoint] very early experiments with extracting fp32 params
#112 opened 2021-09-21 21:35 by
stas00
[requirements] check we test agains the correct deepspeed branch
#108 opened 2021-09-18 00:18 by
stas00
WIP: distributed terashuf
#92 opened 2021-09-02 20:35 by
adammoody
wip [CI] dealing with concurrency
#89 opened 2021-09-01 17:19 by
stas00
extend preprocess_data_dist to handle jsonl files
#60 opened 2021-08-11 19:45 by
adammoody
C4-mC4 pre processing
#9 opened 2021-07-23 08:31 by
sbmaruf
Newer