transformers
Fixes to alternating SWA layers in Gemma2
#31775
Merged

Fixes to alternating SWA layers in Gemma2 #31775

turboderp
turboderp HybridCache: Flip order of alternating global-attn/sliding-attn layers
a1a3ccd6
turboderp HybridCache: Read sliding_window argument from cache_kwargs
5d9679d3
turboderp Gemma2Model: Flip order of alternating global-attn/sliding-attn layers
e45fb6e2
turboderp Code formatting
5bde080e
LysandreJik
fizzAI
ArthurZucker
ArthurZucker approved these changes on 2024-07-10
ArthurZucker
ArthurZucker commented on 2024-07-10
ArthurZucker
ArthurZucker ArthurZucker merged a695c186 into main 1 year ago
ArthurZucker

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone