Add CB #38085

ArthurZucker
ArthurZucker stash for now
4d2c0f92
ArthurZucker initial commit
1dbce45e
ArthurZucker small updated
123ea1f0
ArthurZucker up
ae310b24
ArthurZucker up
b89d03b6
ArthurZucker works!
db898a2c
ArthurZucker nits and fixes
fe776594
ArthurZucker don't loop too much
1fbff280
ArthurZucker finish working example
70564019
ArthurZucker update
4761c8d9
ArthurZucker fix the small freeblocks issue
5639730d
McPatate feat: stream inputs to continuous batch
7aba0a05
McPatate fix: update attn from `eager` to `sdpa`
ade3159f
McPatate refactor: fmt
b18e8f7f
McPatate refactor: cleanup unnecessary code
fadfb647
McPatate feat: add `update` fn to `PagedAttentionCache`
9ef6e92b
McPatate feat: broken optimal block size computation
b0592cff
McPatate fix: debugging invalid cache logic
c7484cce
McPatate fix: attention mask
a89c534b
McPatate refactor: use custom prompts for example
09f415a3
McPatate feat: add streaming output
6c749b53
McPatate fix: prefill split
ef809bf2
McPatate fix: send decoded tokens when `prefilling_split` -> `decoding`
f4c76024
McPatate refactor: move logic to appropriate parent class
45857dee
McPatate fix: remove truncation as we split prefilling anyways
8629a5e2
McPatate feat: add paged attention forward
93a1016d
ArthurZucker push Ggraoh>
bf03fa32
ArthurZucker add paged sdpa
3d57cc37
ArthurZucker Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
0e8b1f33
ArthurZucker update
768788fd
ArthurZucker btter mps defaults
899e2c76
McPatate feat: add progress bar for `generate_batch`
4b6e9b3b
McPatate feat: add opentelemetry metrics (ttft + batch fill %age)
476621ef
McPatate feat: add tracing
8a201e2e
ArthurZucker Add cuda graphs (#38059)
5c859ad4
ArthurZucker revert llama changes
0fb48e84
ArthurZucker fix merge conflicts
e2b4a890
HuggingFaceDocBuilderDev
McPatate fix: tracing and metrics
967a0847
ArthurZucker my updates
30cf2f87
ArthurZucker update script default values
ffb7c41c
ArthurZucker fix block allocation issue
c52edd84
ArthurZucker Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
1f52f875
ArthurZucker fix prefill split attnetion mask
a535e537
ArthurZucker no bugs
b62086b3
ArthurZucker add paged eager
026b9ef9
ArthurZucker fix
917ca13d
ArthurZucker update
fa1bfa36
ArthurZucker style
a861b2de
McPatate feat: add pytorch traces
4010d07f
ArthurZucker fix
259c5428
ArthurZucker fix
685a4227
McPatate refactor: remove pytorch profiler data
3401b194
McPatate McPatate force pushed from 685a4227 to 3401b194 248 days ago
ArthurZucker style
497e057d
ArthurZucker Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
a0874b84
ArthurZucker nits
a2f8bbe5
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into feat/…
be9f683c
ArthurZucker cleanup
ee81e51c
ArthurZucker draft test file
fdf319a5
ArthurZucker fix
0c8868c7
ArthurZucker Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
6c2e01a3
ArthurZucker fix
e0c6113d
ArthurZucker fix paged and graphs
57a3ae75
ArthurZucker small renamings
1c67666c
ArthurZucker cleanups and push
0fa7bb00
McPatate refactor: move tracing and metrics logic to utils
54dd8b7f
McPatate refactor: trace more blocks of code
624e00ee
ArthurZucker nits
c6d8168e
ArthurZucker nits
562f2d30
ArthurZucker Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
69f307d1
ArthurZucker update
3cf8e08f
ArthurZucker to profile or not to profile
86762a61
McPatate refactor: create new output object
6ac60002
ArthurZucker causal by default
d47ab92b
ArthurZucker Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
eff9d66f
ArthurZucker cleanup but generations are still off for IDK what reason
294ed692
ArthurZucker simplifications but not running still
e4abe365
ArthurZucker this does work.
9d79be98
ArthurZucker small quality of life updates
c719293d
ArthurZucker nits
ad78b205
ArthurZucker updaet
b0802951
ArthurZucker fix the scheduler
3d9045e5
ArthurZucker fix warning
afbf7c84
ArthurZucker ol
7f80c03e
ArthurZucker fully fixed
9be54398
ArthurZucker nits
268fa52d
ArthurZucker different generation parameters
27b550c5
ArthurZucker nice
0b5c1e90
ArthurZucker just style
aba184ec
McPatate feat: add cache memory usage
de20a843
McPatate feat: add kv cache free memory
71616ebb
McPatate feat: add active/waiting count & req latency
3d1ed438
ArthurZucker do the sampling
1c0ef44d
ArthurZucker Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
938a0124
ArthurZucker fix: synchronize CUDA only if available and improve error handling in…
6dad2d3b
ArthurZucker fix on mps
b05c857f
McPatate feat: add dashboard & histogram buckets
ff5b08ac
McPatate perf: improve waiting reqs data structures
5f619da6
ArthurZucker attempt to compile, but we should only do it on mps AFAIK
0b503243
McPatate feat: decouple scheduling logic
6b9a1075
ArthurZucker just a draft
ab3d3484
ArthurZucker c;eanup and fixup
cca3009b
ArthurZucker optional
0039ba1e
gante
gante commented on 2025-05-22
ArthurZucker style
2243fad1
ArthurZucker ArthurZucker marked this pull request as ready for review 239 days ago
ArthurZucker update
206f1fa4
ArthurZucker update
c537d01a
ArthurZucker remove the draft documentation
3d4709c6
ArthurZucker fix import as well
0e34470e
McPatate
McPatate approved these changes on 2025-05-22
ArthurZucker update
6f4ecd32
ArthurZucker fix the test
db22dd5b
ArthurZucker style doomed
5a76a277
ArthurZucker ArthurZucker merged 211f2b08 into main 239 days ago
ArthurZucker ArthurZucker deleted the feat/stream_inputs_to_continuous_batch branch 239 days ago
McPatate
McPatate commented on 2025-05-23

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone