PR #8980 llama : support RWKV v6 models

llama : support RWKV v6 models #8980

ggerganov merged 53 commits into ggml-org:master from MollySophia:for-upstream

github-actions added python

github-actions added ggml

compilade requested a review from

compilade 1 year ago

MollySophia force pushed 1 year ago

compilade commented on 2024-08-11

MollySophia force pushed 1 year ago

Ronsor commented on 2024-08-11

compilade commented on 2024-08-11

MollySophia force pushed 1 year ago

compilade commented on 2024-08-12

MollySophia force pushed 1 year ago

MollySophia commented on 2024-08-13

MollySophia force pushed 1 year ago

compilade commented on 2024-08-25

MollySophia force pushed 1 year ago

compilade commented on 2024-08-25

convert_hf_to_gguf: Add support for RWKV v6

8d2eca35

Add RWKV tokenization

dc0767f4

Fix build

865167d0

Do not use special tokens when matching in RWKV tokenizer

7cac72a8

Fix model loading

e92c74f4

Add (broken) placeholder graph builder for RWKV

a0aae8d6

Add workaround for kv cache

a8667896

Add logits conversion to rwkv5

4e23d971

Add rwkv5 layer norms

54795885

Add time mix KVRG & correct merge mistake

dd3aa3d4

Add remaining time mix parameters

b409fd8e

Add time mix output loading

3cbeffc5

Add placeholder llm_build_time_mix

b3b17e05

Fix build

700dad1b

Load more tensors for rwkv v6

a180b63b

Fix rwkv tokenizer

0e5ac349

ggml: Add unary operator Exp

5732de89

RWKV v6 graph building

0784a0cf

Add ``rescale_every_n_layers`` parameter

8d498c70

Add ``wkv.head_size`` key for RWKV

903089b5

Fix offloading layers to CUDA

98ce5f43

Fix parallel inferencing for RWKV

01dcf4bb

Remove trailing whitespaces

6ae2f486

build_rwkv: Avoid using inplace operations

8bc1f9ae

convert_hf_to_gguf: rwkv: Avoid using ``eval``

18decea3

convert_hf_to_gguf: rwkv tokenizer: Don't escape sequences manually

7f2e370f

Update convert_hf_to_gguf.py

c6955525

ggml: Add backward computation for unary op ``exp``

8aa711ad

Update convert_hf_to_gguf.py

ae9936a8

Update convert_hf_to_gguf.py

5afa3eff

Use MODEL_ARCH.RWKV6 instead of MODEL_ARCH.RWKV

12fbe1ad

build_rwkv6: Simplify graph

276d53b1

llama: rwkv6: Detect model.type

b0f4fe52

llama: rwkv6: Fix tensor loading for 7B/14B models

683d70cb

llama: rwkv6: Fix group_norm assertion failure with Metal

ee1b78c0

llama: rwkv6: Clean up

c165e346

llama: rwkv6: Add quantization tensor exclusion

6da6aa48

llama: rwkv6: Use the new advanced batch splits

f5d955d2

Update src/llama.cpp

57decb4a

llama: rwkv6: Use ``ggml_norm`` instead of ``ggml_group_norm``

e94778ad

llama: rwkv6: Apply code style and misc changes

7756afd8

converter: Use class name ``Rwkv6Model``

87a29014

llama: rwkv6: Make use of key ``feed_forward_length``

c414a24a

llama: rwkv6: Add kv ``time_mix_extra_dim`` and ``time_decay_extra_dim``

6d69fd77

converter: Match ``new_name`` instead of ``name`` for float32 explici…

601b5920

llama: rwkv6: Keep ``time_mix_w1/w2`` as F32

e0ea5114

llama: rwkv6: Remove unused nodes

5f00c52b

llama: rwkv6: Apply code format changes

7444046c

MollySophia force pushed to 7444046c 1 year ago

llama: rwkv6: Add lora for some supported tensors

7f2ef566

rwkv : speed-up tokenization using trie

7004323e

minor : style + indentation

59dc2e70

ggerganov approved these changes on 2024-08-30

ggerganov requested a review from

compilade 1 year ago

compilade approved these changes on 2024-08-30

llama: rwkv6: Avoid division by zero

51753757

ggml: rwkv_wkv: Avoid copying the state

846358d3

ggerganov merged 8f1d81a0 into master 1 year ago

Reviewers

ggerganov

compilade

Ronsor

Assignees

No one assigned

Labels

python ggml

Milestone

No milestone

llama.cpp llama : support RWKV v6 models #8980 Merged

llama : support RWKV v6 models #8980

llama.cpp
llama : support RWKV v6 models
#8980

Merged