llama : support RWKV v6 models #8980
Ronsor
commented
on 2024-08-11
Ronsor
commented
on 2024-08-11
Ronsor
commented
on 2024-08-11
convert_hf_to_gguf: Add support for RWKV v6
8d2eca35
Add RWKV tokenization
dc0767f4
Fix build
865167d0
Do not use special tokens when matching in RWKV tokenizer
7cac72a8
Fix model loading
e92c74f4
Add (broken) placeholder graph builder for RWKV
a0aae8d6
Add workaround for kv cache
a8667896
Add logits conversion to rwkv5
4e23d971
Add rwkv5 layer norms
54795885
Add time mix KVRG & correct merge mistake
dd3aa3d4
Add remaining time mix parameters
b409fd8e
Add time mix output loading
3cbeffc5
Add placeholder llm_build_time_mix
b3b17e05
Fix build
700dad1b
Load more tensors for rwkv v6
a180b63b
Fix rwkv tokenizer
0e5ac349
ggml: Add unary operator Exp
5732de89
RWKV v6 graph building
0784a0cf
Add ``rescale_every_n_layers`` parameter
8d498c70
Add ``wkv.head_size`` key for RWKV
903089b5
Fix offloading layers to CUDA
98ce5f43
Fix parallel inferencing for RWKV
01dcf4bb
Remove trailing whitespaces
6ae2f486
build_rwkv: Avoid using inplace operations
8bc1f9ae
convert_hf_to_gguf: rwkv: Avoid using ``eval``
18decea3
convert_hf_to_gguf: rwkv tokenizer: Don't escape sequences manually
7f2e370f
Update convert_hf_to_gguf.py
c6955525
ggml: Add backward computation for unary op ``exp``
8aa711ad
Update convert_hf_to_gguf.py
ae9936a8
Update convert_hf_to_gguf.py
5afa3eff
Use MODEL_ARCH.RWKV6 instead of MODEL_ARCH.RWKV
12fbe1ad
build_rwkv6: Simplify graph
276d53b1
llama: rwkv6: Detect model.type
b0f4fe52
llama: rwkv6: Fix tensor loading for 7B/14B models
683d70cb
llama: rwkv6: Fix group_norm assertion failure with Metal
ee1b78c0
llama: rwkv6: Clean up
c165e346
llama: rwkv6: Add quantization tensor exclusion
6da6aa48
llama: rwkv6: Use the new advanced batch splits
f5d955d2
Update src/llama.cpp
57decb4a
llama: rwkv6: Use ``ggml_norm`` instead of ``ggml_group_norm``
e94778ad
llama: rwkv6: Apply code style and misc changes
7756afd8
converter: Use class name ``Rwkv6Model``
87a29014
llama: rwkv6: Make use of key ``feed_forward_length``
c414a24a
llama: rwkv6: Add kv ``time_mix_extra_dim`` and ``time_decay_extra_dim``
6d69fd77
converter: Match ``new_name`` instead of ``name`` for float32 explici…
601b5920
llama: rwkv6: Keep ``time_mix_w1/w2`` as F32
e0ea5114
llama: rwkv6: Remove unused nodes
5f00c52b
llama: rwkv6: Apply code format changes
7444046c
llama: rwkv6: Add lora for some supported tensors
7f2ef566
rwkv : speed-up tokenization using trie
7004323e
minor : style + indentation
59dc2e70
ggerganov
approved these changes
on 2024-08-30
compilade
approved these changes
on 2024-08-30
llama: rwkv6: Avoid division by zero
51753757
ggml: rwkv_wkv: Avoid copying the state
846358d3
ggerganov
merged
8f1d81a0
into master 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub