llama.cpp
llama : add Mixtral support
#4406
Merged

llama : add Mixtral support #4406

ggerganov merged 47 commits into master from mixtral
slaren
ggerganov convert : support Mixtral as LLAMA arch
dff8cbeb
ggerganov convert : fix n_ff typo
d38e41ee
ggerganov llama : model loading
a3eefe95
ggerganov ggml : sync latest ggml_mul_mat_id
861cd678
ggerganov llama : update graph to support MoE
aedfad12
ggerganov llama : fix cur -> cur_expert
af1a096b
ggerganov llama : first working version
7ea36953
ggerganov llama : fix expert weighting in the FFN
8b185b70
ggerganov ggml : ggml_get_rows support 2D indexing [n_tokens, n_experts] (cpu o…
7372b622
slaren ggml : add n_as argument to ggml_mul_mat_id
ee8fb399
ggerganov ggml : fix ggml_get_rows to take into account ne02 / ne11
9064b1ca
ggerganov metal : add more general support for ggml_get_rows + tests
2cbcba82
slaren llama : add basic support for offloading moe with CUDA
06dfde3e
ggerganov metal : add/mul/div use general kernel when src1 not cont
7e2006b0
ggerganov metal : reduce the kernel launches for ggml_mul_mat_id
8c5b66ee
slaren ggml : get_rows : support non-contiguos tensors with gaps, generalize…
ac3f7d8e
slaren ggml : update get_rows f16 and q
2e4db482
slaren cuda : support non-contiguous src1 in get_rows
62b95f93
slaren llama : offload missing ffn_moe_silu
0710b0f7
ggerganov metal : fix ggml_get_rows to work with non-cont src1
016f9bb5
ggerganov metal : add indirect mat-vec kernels for all quantization types
6cfb31f9
ggerganov llama : do not quantize expert gating tensors
d1259b7b
ggerganov llama : add n_expert and n_expert_used to hparams + change quants
e640cbe0
slaren test-backend-ops : add moe test
cefebb36
slaren cuda : fix get_rows when ncols is odd
8614aa73
ggerganov convert : determine n_ctx correctly
65923a8e
ggerganov metal : fix ggml_mul_mat_id for F32
b0b83dd9
ggerganov test-backend-ops : make experts more evenly probable (test_moe)
54ba2634
slaren test-backend-ops : cleanup, add moe test for batches
54d254bb
slaren test-backend-ops : add cpy from f32 -> all types test
f1380d78
slaren test-backend-ops : fix dequantize block offset
b0029815
ggerganov llama : fix hard-coded number of experts
8cbaed1d
Dampfinchen
slaren
slaren test-backend-ops : simplify and disable slow tests to avoid CI timeout
ffda94c8
slaren test-backend-ops : disable MOE test with thread sanitizer
33e50f1b
EwoutH
slaren
bergkvist
TheBloke
TheBloke
TheBloke
slaren
ggerganov
Dampfinchen
TheBloke
ggerganov
deniaud
deniaud approved these changes on 2023-12-11
TetrisBlack
sroecker
arch-btw
iibw
DenisSergeevitch
slaren cuda : fix mul_mat_id with multi gpu
296c945d
slaren
sroecker
TetrisBlack
happycube
nivibilla
TheBloke
RefractAI
leedrake5
MatthewCroughan
TheBloke
RefractAI
TheBloke
RefractAI
TheBloke
qnlbnsl
pudepiedj
lmg-anon
jflam
TheBloke
pudepiedj
nivibilla
slaren convert : use 1e6 rope_freq_base for mixtral
7dc75e39
epicfilemcnulty
slaren
slaren convert : fix style
f1cbfabd
TheBloke
slaren
TheBloke
pudepiedj
teleprint-me
trodriguez
jflam
trodriguez
jflam
lxe
qnlbnsl
khimaros
slaren
qnlbnsl
vessenes
vessenes
iammerrick
angrysky56
allnash
happycube
kalomaze
ggerganov
TheBloke
ggerganov
ggerganov
ggerganov convert : support safetensors format
6a419f4d
dctanner
slaren gguf-py : bump version
a742d9f9
ggerganov metal : add cpy f16 -> f32 kernel
08eb9917
Olexorus
ggerganov
pudepiedj
pudepiedj
ggerganov metal : fix binary ops for ne10 % 4 != 0
a51bc0c1
zakkor
ggerganov test-backend-ops : add one more sum_rows test
ea4402bb
DutchEllie
slaren
leedrake5
kurnevsky
sroecker
person4268
iammerrick
leedrake5
TheBloke
iandennismiller
vessenes
jflam
ggerganov
TheBloke
leedrake5
ggerganov ggml : do not use BLAS with ggml_mul_mat_id
90c12e6b
easp
ggerganov
iandennismiller
ggerganov
ariez-xyz
Mrkvak convert-hf : support for mixtral-instruct (#4428)
82e4f645
zakkor
varunshankar07
allnash
varunshankar07
Nicoolodion2
zakkor
iandennismiller
justinh-rahb
Rotatingxenomorph
ggerganov
Rotatingxenomorph
AndreiSva
michael7736
shell-skrimp
iandennismiller
ymcui
someone13574
JustinMeans
USBhost
kalomaze
nivibilla
kalomaze
lawwu
ggerganov metal : fix soft_max kernels
ab558ac2
ggerganov metal : limit kernels to not use more than the allowed threads
109e7aa8
ggerganov
jaysonsantos
ggerganov
TheBloke
Rotatingxenomorph
Muzika
ggerganov
Rotatingxenomorph
varunshankar07
Dampfinchen
ggerganov
Rotatingxenomorph
Muzika
pudepiedj
ggerganov
ggerganov metal : switch to execution barriers + fix one of the barriers
e1241d9b
ggerganov
ggerganov approved these changes on 2023-12-13
ggerganov ggerganov merged 799a1cb1 into master 2 years ago
slaren
nivibilla
justinh-rahb
ggerganov
justinh-rahb
eugenepyvovarov
iandennismiller
paryska99
teleprint-me
zakkor
m-from-space
paryska99
leedrake5
teleprint-me
JianbangZ
starrshaw
ymcui
pudepiedj
kalomaze
nivibilla
pudepiedj
JianbangZ
JianbangZ
EliEron
Rotatingxenomorph
EliEron
152334H
dkuku
paralin
pudepiedj
ParetoOptimalDev
Muzika
nivibilla
ymcui
nivibilla
ggerganov
Muzika
capdevc
pudepiedj
JohannesGaessler
nivibilla
slaren
jxy
ggerganov
capdevc
Rotatingxenomorph
Rotatingxenomorph
brozkrut
shuhongwu
moshemalawach
toncho11
teleprint-me
Limezero
eugenepyvovarov
ggerganov
Limezero
ggerganov
shuhongwu
clemens98
pudepiedj
clemens98
pudepiedj
clemens98
pudepiedj
clemens98

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone