PR #3115 llama : make quantize example up to 2.7x faster

llama : make quantize example up to 2.7x faster #3115

cebtenzzre merged 3 commits into ggml-org:master from cebtenzzre:faster-quantize

cebtenzzre marked this pull request as draft 2 years ago

cebtenzzre marked this pull request as ready for review 2 years ago

slaren commented on 2023-09-11

ggerganov commented on 2023-09-11

ggerganov added high priority

cebtenzzre force pushed 2 years ago

llama : refactor k-quant mixture logic into a function

0c649684

llama : optimize vector use in quantize -> 179% faster

a95aa21d

cebtenzzre force pushed 2 years ago

ggerganov approved these changes on 2023-09-14

cebtenzzre changed the title ~~llama : make quantize example up to 3.7x faster~~ llama : make quantize example up to 2.7x faster 2 years ago

llama : don't zero-init vectors in quantize -> 5.1% faster

f727ad5f

cebtenzzre force pushed to f727ad5f 2 years ago

cebtenzzre merged 98311c42 into master 2 years ago

cebtenzzre deleted the faster-quantize branch 2 years ago

cebtenzzre restored the head branch 2 years ago

Reviewers

ggerganov

slaren

Assignees

No one assigned

Labels

high priority

Milestone

No milestone