PR #4406 llama : add Mixtral support - SemanticDiff

llama : add Mixtral support #4406

ggerganov merged 47 commits into master from mixtral

slaren

ggerganov

convert : support Mixtral as LLAMA arch

dff8cbeb

ggerganov

convert : fix n_ff typo

d38e41ee

ggerganov

llama : model loading

a3eefe95

ggerganov

ggml : sync latest ggml_mul_mat_id

861cd678

ggerganov

llama : update graph to support MoE

aedfad12

ggerganov

llama : fix cur -> cur_expert

af1a096b

ggerganov

llama : first working version

7ea36953

ggerganov

llama : fix expert weighting in the FFN

8b185b70

ggerganov

ggml : ggml_get_rows support 2D indexing [n_tokens, n_experts] (cpu o…

7372b622

slaren

ggml : add n_as argument to ggml_mul_mat_id

ee8fb399

ggerganov

ggml : fix ggml_get_rows to take into account ne02 / ne11

9064b1ca

ggerganov

metal : add more general support for ggml_get_rows + tests

2cbcba82

slaren

llama : add basic support for offloading moe with CUDA

06dfde3e

ggerganov

metal : add/mul/div use general kernel when src1 not cont

7e2006b0

ggerganov

metal : reduce the kernel launches for ggml_mul_mat_id

8c5b66ee

slaren

ggml : get_rows : support non-contiguos tensors with gaps, generalize…

ac3f7d8e

slaren

ggml : update get_rows f16 and q

2e4db482

slaren

cuda : support non-contiguous src1 in get_rows

62b95f93

slaren

llama : offload missing ffn_moe_silu

0710b0f7

ggerganov

metal : fix ggml_get_rows to work with non-cont src1

016f9bb5

ggerganov

metal : add indirect mat-vec kernels for all quantization types

6cfb31f9

ggerganov

llama : do not quantize expert gating tensors

d1259b7b

ggerganov

llama : add n_expert and n_expert_used to hparams + change quants

e640cbe0

slaren

test-backend-ops : add moe test

cefebb36

slaren

cuda : fix get_rows when ncols is odd

8614aa73

ggerganov

convert : determine n_ctx correctly

65923a8e

ggerganov

metal : fix ggml_mul_mat_id for F32

b0b83dd9

ggerganov

test-backend-ops : make experts more evenly probable (test_moe)

54ba2634

slaren

test-backend-ops : cleanup, add moe test for batches

54d254bb

slaren

test-backend-ops : add cpy from f32 -> all types test

f1380d78

slaren

test-backend-ops : fix dequantize block offset

b0029815

ggerganov

llama : fix hard-coded number of experts

8cbaed1d

Dampfinchen

slaren

slaren

test-backend-ops : simplify and disable slow tests to avoid CI timeout

ffda94c8

slaren

test-backend-ops : disable MOE test with thread sanitizer

33e50f1b

EwoutH

slaren

bergkvist

TheBloke

TheBloke

TheBloke

slaren

ggerganov

Dampfinchen

TheBloke

ggerganov

deniaud

deniaud approved these changes on 2023-12-11

TetrisBlack

sroecker

arch-btw

iibw

DenisSergeevitch

slaren

cuda : fix mul_mat_id with multi gpu

296c945d

slaren

sroecker

TetrisBlack

happycube

nivibilla

TheBloke

$RefractAI$

leedrake5

MatthewCroughan

TheBloke

$RefractAI$

TheBloke

$RefractAI$

TheBloke

qnlbnsl

pudepiedj

lmg-anon

jflam

TheBloke

pudepiedj

nivibilla

slaren

convert : use 1e6 rope_freq_base for mixtral

7dc75e39

epicfilemcnulty

slaren

slaren

convert : fix style

f1cbfabd

TheBloke

slaren

TheBloke

pudepiedj

teleprint-me

trodriguez

jflam

trodriguez

jflam

lxe

qnlbnsl

khimaros

slaren

qnlbnsl

vessenes

vessenes

iammerrick

angrysky56

allnash

happycube

kalomaze

ggerganov

TheBloke

ggerganov

ggerganov

ggerganov

convert : support safetensors format

6a419f4d

dctanner

slaren

gguf-py : bump version

a742d9f9

ggerganov

metal : add cpy f16 -> f32 kernel

08eb9917

Olexorus

ggerganov

pudepiedj

pudepiedj

ggerganov

metal : fix binary ops for ne10 % 4 != 0

a51bc0c1

zakkor

ggerganov

test-backend-ops : add one more sum_rows test

ea4402bb

DutchEllie

slaren

leedrake5

kurnevsky

sroecker

person4268

iammerrick

leedrake5

TheBloke

iandennismiller

vessenes

jflam

ggerganov

TheBloke

leedrake5

ggerganov

ggml : do not use BLAS with ggml_mul_mat_id

90c12e6b

easp

ggerganov

iandennismiller

ggerganov

ariez-xyz

Mrkvak

convert-hf : support for mixtral-instruct (#4428)

82e4f645

zakkor

varunshankar07

allnash

varunshankar07

Nicoolodion2

zakkor

iandennismiller

justinh-rahb

Rotatingxenomorph

ggerganov

Rotatingxenomorph

AndreiSva

michael7736

shell-skrimp

iandennismiller

ymcui

someone13574

JustinMeans

USBhost

kalomaze

nivibilla

kalomaze

lawwu

ggerganov

metal : fix soft_max kernels

ab558ac2

ggerganov

metal : limit kernels to not use more than the allowed threads

109e7aa8

ggerganov

jaysonsantos

ggerganov

TheBloke

Rotatingxenomorph

Muzika

ggerganov

Rotatingxenomorph

varunshankar07

Dampfinchen

ggerganov

Rotatingxenomorph

Muzika

pudepiedj

ggerganov

ggerganov

metal : switch to execution barriers + fix one of the barriers

e1241d9b

ggerganov

ggerganov approved these changes on 2023-12-13

ggerganov

ggerganov merged 799a1cb1 into master 2 years ago

slaren

nivibilla

justinh-rahb

ggerganov

justinh-rahb

eugenepyvovarov

iandennismiller

paryska99

teleprint-me

zakkor

m-from-space

paryska99

leedrake5

teleprint-me

JianbangZ

starrshaw

ymcui

pudepiedj

kalomaze

nivibilla

pudepiedj

JianbangZ

JianbangZ

EliEron

Rotatingxenomorph

EliEron

152334H

dkuku

paralin

pudepiedj

ParetoOptimalDev

Muzika

nivibilla

ymcui

nivibilla

ggerganov

Muzika

capdevc

pudepiedj

JohannesGaessler

nivibilla

slaren

jxy

ggerganov

capdevc

Rotatingxenomorph

Rotatingxenomorph

brozkrut

shuhongwu

moshemalawach

toncho11

teleprint-me

Limezero

eugenepyvovarov

ggerganov

Limezero

ggerganov

shuhongwu

clemens98

pudepiedj

clemens98

pudepiedj

clemens98

pudepiedj

clemens98

Login to write a write a comment.

Login via GitHub

Reviewers

ggerganov

ggerganov

deniaud

deniaud

Assignees

No one assigned

Labels

None yet

Milestone

No milestone