PR #2717 llm : add Falcon support

llm : add Falcon support #2717

ggerganov merged 38 commits into master from falcon

llama : refactor GGUF constants into static maps

4ed3469c

llama : check if model architecture is known

8bd7f06b

llama : refactor llama_model_load_internal()

3057d6a6

gguf : add KV constant maps

3c025a6d

Merge branch 'master' into falcon

b19c6e46

llm : read arch-specific KVs

9f28f737

convert : add dummy scores + types

d1b3b95d

falcon : load tensor data (CPU only)

2f3c80a8

llama : fix loading progress bar

5c5413dc

llama : add arch member to llama_model

085228e1

falcon : CPU inference working

3c7c325b

falcon : support non-40B models

2d58444d

falcon : minor

0ec27ad6

llama : minor updates

7bbbf38c

convert-falcon-hf-to-gguf.py : fix special token mapping

9853f2cf

ggerganov marked this pull request as ready for review 2 years ago

llama.cpp : llama default UNK token = id 0

ffa5099c

llama.cpp : fix bpe tokenizer

a95ae752

llama.cpp : fix the fix of bpe tokenizer

d561b7f7

ggml : pass eps to ggml_norm

e3c52bd9

metal : implement RoPE (mode = 2) + avoid ggml_repeat

99bb2607

ggml : ggml_repeat always creates new tensor

af4bbcc8

falcon : copy-paste self-attention from LLaMA

b34ab740

metal : print extra compute pipeline info

a0dc47a5

falcon : minor changes (still chasing the Metal problem)

e2d23bed

llama.cpp : fix linefeed token

b693000c

metal : fix GELU kernel numerical stability by using precise::tanh

0a85ae73

metal : temporary workaround for the concurrency optimization bug

854ae5d0

falcon : add CUDA offloading (#2739)

e7299656

llama : better model naming and size reporting

176ea716

Merge branch 'master' into falcon

6938c5f4

llama : prep new tokenizer support

c3f8a6e4

llama : advanced BPE tokenizer based on ggllm.cpp imlpementation

3bfb7206

llama : remove oboslete comment

2424e1d0

common : remove obsolete BPE API + disable test-tokenizer-1

596e1094

llama : revert BPE special-case in llama_byte_to_token()

f8ee54bd

cuda : add TODOs for RoPE NeoX implementation

8c6d3939

llama : default special tokens based on vocab type

630d8b40

perplexity : add log for start of tokenization

fae8faa1

ggerganov merged cf658adc into master 2 years ago

ggerganov deleted the falcon branch 2 years ago

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

llama.cpp llm : add Falcon support #2717 Merged

llm : add Falcon support #2717

llama.cpp
llm : add Falcon support
#2717

Merged