llama.cpp
Add support for QRWKV6 hybrid models & slight optimization for RWKV6
#11001

Merged

Add support for QRWKV6 hybrid models & slight optimization for RWKV6 #11001

MollySophia merged 12 commits into ggml-org:master from MollySophia:rwkv6qwen2

github-actions added Nvidia GPU

github-actions added Vulkan

github-actions added python

github-actions added ggml

github-actions added SYCL

github-actions added testing

WIP: Add support for RWKV6Qwen2

f298f039

RWKV: Some graph simplification

385b611d

Add support for RWKV6Qwen2 with cpu and cuda GLA

fab0aa7b

RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

bc930cd5

Fix some typos

f2c1a5c9

code format changes

aaa870e8

Fix wkv test & add gla test

00930e6f

Fix cuda warning

08cf5606

Update README.md

331581b2

MollySophia force pushed to 331581b2 1 year ago

ggerganov approved these changes on 2025-01-07

ggerganov requested a review from

compilade 1 year ago

Update ggml/src/ggml-cuda/gla.cu

aed0afb4

Fix fused lerp weights loading with RWKV6

d8a304c2

compilade commented on 2025-01-07

better sanity check skipping for QRWKV6 in llama-quant

324afba5

compilade approved these changes on 2025-01-10

MollySophia merged ee7136c6 into master 1 year ago

Reviewers

ggerganov

compilade

Assignees

No one assigned

Labels

testing Nvidia GPU Vulkan python ggml SYCL

Milestone

No milestone

llama.cpp Add support for QRWKV6 hybrid models & slight optimization for RWKV6 #11001 Merged

Add support for QRWKV6 hybrid models & slight optimization for RWKV6 #11001

llama.cpp
Add support for QRWKV6 hybrid models & slight optimization for RWKV6
#11001

Merged