DeepSpeed
Elastic training support
#602
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
34
Changes
View On
GitHub
Commits
Starting to add config modifications. Currently in incomplete state
jeffra
committed
5 years ago
Adding the core elasticity compatible gpu count generation logic
jeffra
committed
5 years ago
Reverting some of the unfinished modifications to get the file working as standalone'
jeffra
committed
5 years ago
formatting and fix build error
jeffra
committed
5 years ago
add np req and move elasticity
jeffra
committed
5 years ago
update github actions to trigger on all branches
jeffra
committed
5 years ago
fix syntax error
jeffra
committed
5 years ago
exclude docs
jeffra
committed
5 years ago
formatting
jeffra
committed
5 years ago
config restructure, versioning, etc
jeffra
committed
5 years ago
config updates, sanity checks, etc.
jeffra
committed
5 years ago
fix version issue
jeffra
committed
5 years ago
choose best micro batch size for given world size
jeffra
committed
5 years ago
bug fixes
jeffra
committed
5 years ago
add unit test
jeffra
committed
5 years ago
add several unit tests and clean-up code
jeffra
committed
5 years ago
fix install issue when installing on non-gpu machines
jeffra
committed
5 years ago
Merge branch 'master' into jeffra/elastic
jeffra
committed
5 years ago
Merge branch 'master' into jeffra/elastic
jeffra
committed
5 years ago
Merge branch 'master' into jeffra/elastic
jeffra
committed
5 years ago
Merge branch 'master' into jeffra/elastic
jeffra
committed
5 years ago
add ds_elastic cli
jeffra
committed
5 years ago
clean-up
jeffra
committed
5 years ago
formatting
jeffra
committed
5 years ago
docstring
jeffra
committed
5 years ago
fix mbsize division issue
jeffra
committed
5 years ago
formatting
jeffra
committed
5 years ago
checkpoint load latest only if it exists
jeffra
committed
5 years ago
add get_batch_info to engine, assert non-elastic bsz config, fix test
jeffra
committed
5 years ago
fix tests
jeffra
committed
5 years ago
validate elastic config wrt scheduler config, add repr
jeffra
committed
5 years ago
add unit test and fixes
jeffra
committed
5 years ago
require max-batch and micro-batches for elastic training
jeffra
committed
5 years ago
fix test error
jeffra
committed
5 years ago
Loading