transformers
071e178b - enable cpu paged cache (#42869)

Commit
1 day ago
enable cpu paged cache (#42869) * enable cpu paged cache Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable cpu example Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device map Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert xpu deterministic Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update test_paged_attention for CPU Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update cpu groud truth for CI Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * use accelerator Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix typo Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix example Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix num_return_sequences Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix num_return_sequence Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix max_seqlen_q Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * cpu does not support FA2 without paged Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add cpu expected outputs Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert useless change Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert wrong changge Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update comments Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add flex attn for CPU Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix ground truth check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix graph check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Simplify _graphs initialization for CUDA graphs Refactor the initialization of _graphs to simplify the condition for using CUDA graphs. * Update src/transformers/generation/continuous_batching/requests.py Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com> * Update src/transformers/generation/continuous_batching/continuous_api.py Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Author
Parents
Loading