Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
bigscience-workshop/Megatron-DeepSpeed
Pull Requests
Commits
Open
Closed
Startup: add argument-consistency checks & summary table (Fixes #124)
#409 opened 2025-06-20 12:51 by
MagellaX
fix(training): correct rank-zero log messages, Print total model size once at startup (rank-0) – Fixes #123
#408 opened 2025-06-20 12:11 by
MagellaX
Bump black from 21.4b0 to 24.3.0
dependencies
#402 opened 2024-03-20 16:10 by
dependabot[bot]
Add xPos embeddings
#370 opened 2023-03-07 09:50 by
janEbert
Fix various small problems
#367 opened 2023-02-28 16:38 by
janEbert
Bloom model training with AML
#365 opened 2023-02-21 19:47 by
savitamittal1
Add UL2 data sampling and pretraining
#358 opened 2022-12-13 14:24 by
janEbert
Add FlashAttention
#357 opened 2022-12-12 19:03 by
NouamaneTazi
Distill BLOOM - tentative 2
#354 opened 2022-10-11 09:27 by
younesbelkada
Enable rocm-support
#353 opened 2022-10-07 08:03 by
luukkonenr
Encoding checkpoint reshaping guide
#349 opened 2022-09-20 03:33 by
tjruwase
Add multiple evaluation compat
#336 opened 2022-08-30 17:25 by
Muennighoff
[checkpoints] replace bf16 with fp32 checkpoint weights
#327 opened 2022-08-10 03:49 by
stas00
Prefix LM Eval
#313 opened 2022-07-16 14:12 by
Muennighoff
Add Bitfit
#311 opened 2022-07-10 18:00 by
Muennighoff
Tool for CKPT averaging
#310 opened 2022-07-10 17:39 by
Muennighoff
Enable loading ckpt for t0 finetuning
#309 opened 2022-07-10 11:15 by
Muennighoff
[WIP] Hack my way to get OPT running
#301 opened 2022-07-04 19:55 by
thomasw21
[MLM] Train script for non causal decoder
#300 opened 2022-07-04 12:25 by
thomasw21
a branch combining layer-norm-auto-sync and ds_ckpt_reshape
#292 opened 2022-06-29 19:16 by
stas00
BigScience Eval Harness
#291 opened 2022-06-29 11:09 by
Muennighoff
No-ZeRO reshaping
#289 opened 2022-06-23 12:02 by
Muennighoff
WIP: Shared t5 code
#286 opened 2022-06-21 15:44 by
thomasw21
[WIP] add debug utils
#275 opened 2022-03-28 20:11 by
stas00
Sync 4 layer norms - bf16, fp32, optimizer states on restart
#274 opened 2022-03-28 19:11 by
tjruwase
Sync layer norm
#271 opened 2022-03-24 22:38 by
thomasw21
Test different layer norm
#270 opened 2022-03-24 13:26 by
thomasw21
Preprocessing from arrow file to load an HF dataset
#264 opened 2022-03-11 05:49 by
TevenLeScao
launch debug code
#263 opened 2022-03-11 04:12 by
stas00
Add `_validate_args`
#233 opened 2022-01-18 19:34 by
bhavitvyamalik
Older