stash for now
4d2c0f92
initial commit
1dbce45e
small updated
123ea1f0
up
ae310b24
up
b89d03b6
works!
db898a2c
nits and fixes
fe776594
don't loop too much
1fbff280
finish working example
70564019
update
4761c8d9
fix the small freeblocks issue
5639730d
feat: stream inputs to continuous batch
7aba0a05
fix: update attn from `eager` to `sdpa`
ade3159f
refactor: fmt
b18e8f7f
refactor: cleanup unnecessary code
fadfb647
feat: add `update` fn to `PagedAttentionCache`
9ef6e92b
feat: broken optimal block size computation
b0592cff
fix: debugging invalid cache logic
c7484cce
fix: attention mask
a89c534b
refactor: use custom prompts for example
09f415a3
feat: add streaming output
6c749b53
fix: prefill split
ef809bf2
fix: send decoded tokens when `prefilling_split` -> `decoding`
f4c76024
refactor: move logic to appropriate parent class
45857dee
fix: remove truncation as we split prefilling anyways
8629a5e2
feat: add paged attention forward
93a1016d
push Ggraoh>
bf03fa32
add paged sdpa
3d57cc37
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
0e8b1f33
update
768788fd
btter mps defaults
899e2c76
feat: add progress bar for `generate_batch`
4b6e9b3b
feat: add opentelemetry metrics (ttft + batch fill %age)
476621ef
feat: add tracing
8a201e2e
Add cuda graphs (#38059)
5c859ad4
revert llama changes
0fb48e84
fix merge conflicts
e2b4a890
fix: tracing and metrics
967a0847
my updates
30cf2f87
update script default values
ffb7c41c
fix block allocation issue
c52edd84
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
1f52f875
fix prefill split attnetion mask
a535e537
no bugs
b62086b3
add paged eager
026b9ef9
fix
917ca13d
update
fa1bfa36
style
a861b2de
feat: add pytorch traces
4010d07f
fix
259c5428
fix
685a4227
refactor: remove pytorch profiler data
3401b194
McPatate
force pushed
from
685a4227
to
3401b194
248 days ago
style
497e057d
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
a0874b84
nits
a2f8bbe5
Merge branch 'main' of github.com:huggingface/transformers into feat/…
be9f683c
cleanup
ee81e51c
draft test file
fdf319a5
fix
0c8868c7
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
6c2e01a3
fix
e0c6113d
fix paged and graphs
57a3ae75
small renamings
1c67666c
cleanups and push
0fa7bb00
refactor: move tracing and metrics logic to utils
54dd8b7f
refactor: trace more blocks of code
624e00ee
nits
c6d8168e
nits
562f2d30
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
69f307d1
update
3cf8e08f
to profile or not to profile
86762a61
refactor: create new output object
6ac60002
causal by default
d47ab92b
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
eff9d66f
cleanup but generations are still off for IDK what reason
294ed692
simplifications but not running still
e4abe365
this does work.
9d79be98
small quality of life updates
c719293d
nits
ad78b205
updaet
b0802951
fix the scheduler
3d9045e5
fix warning
afbf7c84
ol
7f80c03e
fully fixed
9be54398
nits
268fa52d
different generation parameters
27b550c5
nice
0b5c1e90
just style
aba184ec
feat: add cache memory usage
de20a843
feat: add kv cache free memory
71616ebb
feat: add active/waiting count & req latency
3d1ed438
do the sampling
1c0ef44d
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
938a0124
fix: synchronize CUDA only if available and improve error handling in…
6dad2d3b
fix on mps
b05c857f
feat: add dashboard & histogram buckets
ff5b08ac
perf: improve waiting reqs data structures
5f619da6
attempt to compile, but we should only do it on mps AFAIK
0b503243
feat: decouple scheduling logic
6b9a1075
just a draft
ab3d3484
c;eanup and fixup
cca3009b
optional
0039ba1e
gante
commented
on 2025-05-22
style
2243fad1
ArthurZucker
marked this pull request as ready for review 239 days ago
update
206f1fa4
update
c537d01a
remove the draft documentation
3d4709c6
fix import as well
0e34470e
McPatate
approved these changes
on 2025-05-22
update
6f4ecd32
fix the test
db22dd5b
style doomed
5a76a277
ArthurZucker
deleted the feat/stream_inputs_to_continuous_batch branch 239 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub