DeepSpeed
DeepCompile for enhanced compiler integration
#7154
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
402
Changes
View On
GitHub
DeepCompile for enhanced compiler integration
#7154
tohtana
merged 402 commits into
master
from
tohtana/deepcompile
free output nodes early
bdec410a
remove counter for release
498ce9c3
remove init of reduce_counter in start_backward
96ddd57a
remove reset counter for wait
9cf9f885
remove pre/post process for profiling
b5fd3871
set tensor size to zero when no profiled value is available
0687b5ee
choose persistent params
215f28b6
display comm performance
84c3c323
selectively make params persistent
e476c8df
Merge branch 'tohtana/no_z3_hook_selective_persistent' of https://git…
0bbc0aaf
fix profiling time
925952ef
Merge branch 'tohtana/no_z3_hook' into tohtana/no_z3_hook_selective_p…
ee683246
add comm profiler
3843e115
add prefetch api
54237ed4
fix function names for consistency
71335c7b
Merge branch 'tohtana/no_z3_hook_selective_persistent' of github.com:…
f7fa49a4
add prefetch pass
fedb6d62
fix prefetch
3fe91f5c
fix input to mem profiler
5aadf96b
fix prefetch conditions
a5832228
split prefetch
e7563a3a
adjust size in comm profiler
22bf332d
fix memory estimation
c8d72e65
fix fusing
279ed29c
fix event order for prefetching
044cdb0f
use high priority stream for allgather
2a1e06ca
rename var
707d4dfe
run reduce with different stream
c0ee4b25
fix allgather
9e43ecfc
run copy on a dedicated stream
d6c6dde9
add option to set double buffer
3ef627f6
fix sync of copy for reduce
1b4ca632
keep ref to src of copy
7d70d6f2
fill missing mem value
b73c284c
fix data size for comm profiling
47e3d835
refactor structure for compile passes
6357208e
refactor packages
aef0a077
refactor fx related apis
93e02892
move functions in backend
1c9fbe27
refactor backend
7f4bcaba
remove deprecated files
cfcc2597
make copy stream high priority
d85f3b16
exclude persistent params from prefetch target
0f1ab894
fix prefetching
a4f62441
fix gathered params
31d915df
fix persistent param
d1edca3f
keep persistent param when clear_all_gathered_params is called
a9d5d161
update persistent parameters with z3 optimizer
98fce1d1
simplify args
dc7779bc
fix memory optimization
131cfe09
increase mem margin
e3ebec9d
allreduce the flag for cache
71637a3d
fix data size range of comm profiler
199483a2
fix for grad acc
67706727
fix args for opt pass
b55facee
fix guard for opt passes
55cf1935
add prediv mode
83efa8fe
add sync for double buffer
f16331af
fix lifetime of recv buffer
949a02af
add option for double buffering
82eada3c
call reset of nz3
0d5069ac
reload with out-of-place to
8e05828c
fix memory estimation
65161a8b
explicitly free mem profiler
574bd723
sync profile values
ecc3daca
free mem in backward
84c27e0e
add activation offloading
6738f5bb
fix freeing activation
e779af5e
handle symbolic shape for act offloading
8ac262d9
add license
8426e0a6
fix sync for offload
424e1e68
fix release of offload buffer
e74606c3
fix freeing activation for offloading
6dac191c
offload activation when starting bw compiler
0d3ac840
refactor offloading in bw compiler
0734911c
hook compiled function class to take bwd inputs
722d78f3
enable free activation
8d5e23c1
enable free activation in profiling
0ccda769
patch compiled function again for second pass
e6058e01
remove adjustment in mem profiler
08e3b3d8
remove memory adjustment
f873d561
add pass for offloading adam states
ad70e90a
free only cuda tensor activation
b369029a
Merge branch 'tohtana/no_z3_hook_offload_opt' of https://github.com/t…
b2de70e2
improve persistent param
e5b61099
create reduce buffer on reduce stream
d29fe7c5
fix sync of adam offloading
4380a06b
Merge branch 'tohtana/no_z3_hook_offload_opt' of github.com:tohtana/D…
3517136c
stop freeing activation when ac is enabled
f4cf6062
clear allgather valid flag
d7fcd910
use peak memory
3db1c011
use reset_peak_memory_stats instead of reset_max_memory_allocated
64fb7152
discard profiler's results
70a4cd8c
offload only base tensor
348ab752
fix cleaning of bwd inputs
72675ee6
keep reduce bucket when resetting
3e8f3df3
add opt state offloading
a38c8dd6
bcast max alloc
8120edc9
implement adam state offloading
66b9f4f3
add api to check profiling mode
7b93e2f8
fix node insertion
1c557b8c
add option to set passes
87466f87
add api to get compile time
01ca374c
fix offloading
d1c925a5
add memory profile nodes
d76e4508
fix offloading adam states
c2b031c0
fix memory release for offloading
e24e73e8
fix memory estimation
6e48a126
fix offloading
bfd05caa
Merge branch 'master' into tohtana/no_z3_hook
68f1629c
reduce min total memory
f9a0c792
allocate persistent mem buffer earlier
1245e9a9
fix arg of opt pass
14262935
fix for multiple graphs
33550885
fix to run multi allgather inputs
eda6d8e0
fix allgather sync
cd02b5e6
move compile package under deepspeed
08b06ded
move compile configs to ds_config
3ce29f66
refactor compile api in engine
89ef42f8
add no copy ops
11ae91e4
fix import
6d87c906
refactor to generalize z3 pass
67d785b4
add prefetch and selective gather
5617408a
refactor to select opt pass
b66b8d1c
refactor to add z1
23136b0c
add dummy op
1e52831e
rename namespace
d3817404
formatting
657bcece
refactor cpp files
6317c356
rename functions to avoid conflicts
354db6e9
split executor class
644598d1
refactor executor
8c5410b9
refactor downcast
88930df1
extract api registration
535e9141
add flag to show z3 or not
384edd49
refactor engine
be8e8f02
add z1 op and allreduce
2d570ccd
set gradients in optimizer (breaks z3)
cab0cb2a
fix for z3
e503286a
add assertion to check misconfiguration
a5f135b8
rename builder file
14a126ae
enable inductor for z1
eec3cf8b
update for indcutor
930df9d0
register custom op to avoid reusing input/output
3b6b9a35
enable inductor
26c58ba9
fix trace input for z1
5b8f758a
fix gradient accumulation
5533e2bc
copy output to avoid issue with inductor
89603ee5
simplify wait allgather
8fe5e6cf
fix metadata of param inputs for inductor
8053b440
enable debug log
e6a3b698
add log after scheduling
5f160b31
add permute to no copy
f71189dc
fix partitioner for inductor
fad5b024
log bw graph
76d898ab
free grad tensor in dc
eee3a811
add free after reduce
cbb2fc18
show debug log only on rank0
c0256bf0
add to never_reuse list after value is realized
df37d374
clean debug output
806d6c74
enable debug option
b43cb2ad
add message and sync for debugging
41586dc6
free output of release_param
bf5d67ee
remove clone in release_param
6a291e4a
enable clone in release_param
9ea5a92a
remove sync at reduce
71d66840
remove logs
b616b7aa
Merge pull request #1 from tohtana/tohtana/no_z3_hook_fix_inductor_me…
46ad3cd5
Merge branch 'pub_master' into tohtana/no_z3_hook
6376ef8c
add debug flag
5a2c53c2
add clone after wait_allgather
aceca037
set multiple release nodes
77f2c15e
add debug options
7340b847
add workaround for symint inputs
fa98441d
add debugging options
f7f9c65e
offload parameter added
4a8faeb5
added parameter offload
40fa3714
pre-commit passed
d504ace0
refactored stage3
4ddf13a1
removed mutable
5669b067
added asynchronous offloading for DSParama
3165c251
Merge pull request #3 from tohtana/zafar/no_z3_hook/offload_parameter
fcda0c07
fix for pt26
9f95c5b2
fix import for pt2.7
14463bd9
remove hook when deepcompile
93dbe4e7
remove assertions
e603898b
refactor aot patch for pt26
6e787878
fix patch for inductor
4d906f07
fix patch for pt27
f791d701
Merge branch 'tohtana/no_z3_hook_dbg_large_mem' into tohtana/no_z3_hook
c6f79c50
Merge branch master into tohtana/deepcompile
cad8f335
revert formatting
1b7dc8de
tohtana
requested a review
from
tjruwase
280 days ago
tohtana
requested a review
from
GuanhuaWang
280 days ago
tohtana
requested a review
from
loadams
280 days ago
tohtana
requested a review
from
jomayeri
280 days ago
fix import in builder
405a82d0
add __init__ in compile
79f2fcf7
handle import error
fa68c49e
rename builder file
88b42245
change package name of builder
a5341026
avoid importing sym_size when deepcompile is not supported
5f19c553
fix import error in cpu environment
dd1bb757
catch error of importing functorch
6d70e0fb
fix import
472573ae
remove unused global reference to symint
eb6c0df9
fix import errors
4b1a5b58
catch import error
781e1169
fix default config for deepcompile
f6028677
fix import
6fc79198
fix import
7610a45a
allow overwriting compile pass
fbc020f5
supress buidling deepcompile ops in test
584c801d
Merge branch 'master' into tohtana/deepcompile
514b6b74
revert change for offloading
82dca53b
Merge branch 'tohtana/deepcompile' of github.com:deepspeedai/DeepSpee…
d130b3d0
Merge branch 'master' into tohtana/deepcompile
e5bf1851
fix for rng input
d2eeff6d
Merge branch 'master' into tohtana/deepcompile
bccbd364
fix offloading pass
39bb1152
Merge branch 'master' into tohtana/deepcompile
af396e71
add sync offload func
f66102af
remove README
4a26c44d
add test
832a147f
Merge branch 'master' into tohtana/deepcompile
429a4231
Merge branch 'master' into tohtana/deepcompile
dd7dbfe4
loadams
commented on 2025-03-27
surpress building deepcompile ops for windows
721bc58f
add scipy to requirement
8af3f964
Merge branch 'tohtana/deepcompile' of github.com:deepspeedai/DeepSpee…
d15eb6ca
fix return value
9b23d6fc
free real input to forward
22b7bc29
Merge branch 'master' into tohtana/deepcompile
e639c85f
add deepcompile dependency
61395e77
Merge branch 'tohtana/deepcompile' of github.com:deepspeedai/DeepSpee…
9feaefc1
fix offload
23c7d2d7
remove debugging code
37134a3d
Merge branch 'tohtana/deepcompile' of github.com:deepspeedai/DeepSpee…
7c78179e
fix initial pass for z3
da1cccb6
add blog
66c0e942
update blog author list
608dabf0
fix figure caption
7bc9c22c
update images
30b89eb2
Merge branch 'master' into tohtana/deepcompile
31665250
add link to arxiv paper
525b64d2
Merge branch 'tohtana/deepcompile' of github.com:deepspeedai/DeepSpee…
ee2ae9dd
Writing fixes
25caf1d4
Merge branch 'tohtana/deepcompile' of github.com:deepspeedai/DeepSpee…
7479c76c
fixed chart caption
96c31372
More tweaks
b11f9cbb
Merge
d22d1a1b
update labels in blog
e59031aa
tjruwase
commented on 2025-04-15
tjruwase
commented on 2025-04-15
tjruwase
commented on 2025-04-15
improve error message regarding symmetric memory support
abfce9df
tjruwase
commented on 2025-04-16
tjruwase
approved these changes on 2025-04-16
More tweaks
37328063
add link to blog on the top
38e8b178
tohtana
enabled auto-merge
253 days ago
tohtana
merged
227a60c0
into master
253 days ago
tohtana
deleted the tohtana/deepcompile branch
253 days ago
bsochack
commented on 2025-07-02
Login to write a write a comment.
Login via GitHub
Reviewers
tjruwase
loadams
bsochack
AliZafar120
GuanhuaWang
jomayeri
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub