gemma2: add sliding window mask (#8227)
* gemma2: add sliding window mask
* fix data_swa uninitialized
* better naming
* add co-author
Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com>
* replace list with single tensor
* update
* llama : minor styling
* convert : add sanity check for query_pre_attn_scalar
* fix small typo in README
---------
Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>