enable cpu paged cache (#42869)
* enable cpu paged cache
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable cpu example
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix device map
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert xpu deterministic
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update test_paged_attention for CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update cpu groud truth for CI
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* use accelerator
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix typo
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix example
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix num_return_sequences
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix num_return_sequence
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix max_seqlen_q
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* cpu does not support FA2 without paged
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* add cpu expected outputs
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert useless change
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert wrong changge
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update comments
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* add flex attn for CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix ground truth check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix graph check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Simplify _graphs initialization for CUDA graphs
Refactor the initialization of _graphs to simplify the condition for using CUDA graphs.
* Update src/transformers/generation/continuous_batching/requests.py
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
* Update src/transformers/generation/continuous_batching/continuous_api.py
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>