transformers
Fixes to alternating SWA layers in Gemma2
#31775
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
4
Changes
View On
GitHub
Commits
HybridCache: Flip order of alternating global-attn/sliding-attn layers
turboderp
committed
1 year ago
HybridCache: Read sliding_window argument from cache_kwargs
turboderp
committed
1 year ago
Gemma2Model: Flip order of alternating global-attn/sliding-attn layers
turboderp
committed
1 year ago
Code formatting
turboderp
committed
1 year ago
Loading