the fix that did not get in (#37370)

Commit

306 days ago

the fix that did not get in (#37370) * debugging improvements * add debugging details * add more debugging details * debug more * the fix that did not get in * First fix flex * fix query offset * fix flex first * fix device mask creation for speed * small mask creation sdpa * Update flex_attention.py * remove chunked prefill from HybridChunkedCache * never seen such a fucked up merged * clean up layers + output * add summary json file * Efficient general cache * Update cache_utils.py * cleanup * fix? * fix! * oups typo * not everywhere * more fixes * revert unrelated changes * Fix but ugly for now -> should use pad instead * oups * re-initialize the cache * Use pad to simplify * style * correct slicing --------- Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

References

#37370 - the fix that did not get in

Author

ArthurZucker

Parents

f834ca2c

transformers e032d12e - the fix that did not get in (#37370)

transformers
e032d12e - the fix that did not get in (#37370)