ochafik/llama.cpp

Pull Requests Commits

grammars: mutex-guarded lazy caching of token pieces in llama_sample_grammar

Olivier Chafik committed 2 years ago

09f0fae0

ci : re-enable sanitizer runs (#7358)

ggerganov committed 2 years ago

Verified 059031b8

android : use "ci-android" branch for CI (#7341)

ggerganov committed 2 years ago

Verified 511182ea

CUDA: deduplicate FlashAttention code (#7352)

JohannesGaessler committed 2 years ago

Verified 133d99c5

server: correct --threads documentation [no ci] (#7362)

JohannesGaessler committed 2 years ago

Verified cb42c294

cuda : add half2 __shfl_xor() for ROCm 5.5 (#7263)

Engininja2 committed 2 years ago

Verified d233b507

llama : add support for larger Granite Code Models (20B, 34B) (#7324)

sroecker committed 2 years ago

Verified 0f98acfa

perplexity : ndot progress and show stats with < 100 tasks (#7348)

strawberrymelonpanda committed 2 years ago

Verified ca57e0f3

Update and fix Vulkan soft_max and argsort implementations (#7237)

0cc4m committed 2 years ago

Verified c1b295ee

github-actions-labeler: initial commit (#7330)

mofosyne committed 2 years ago

Verified de731963

convert : fix set_vocab_sentencepiece (#6866)

ggerganov committed 2 years ago

Verified b49a13dd

ggml : fix quants nans when all the group weights are very close to zero (#7313)

slaren committed 2 years ago

Verified 05834841

cmake : fix typo in AMDGPU_TARGETS (#7356)

Engininja2 committed 2 years ago

Verified ef277de2

Unicode codepoint flags for custom regexs (#7245)

jaime-m-p committed 2 years ago

Verified b43272af

CUDA: faster large batch FA without tensor cores (#7314)

JohannesGaessler committed 2 years ago

Verified 0fc1e820

ROCm: use native CMake HIP support (#5966)

GZGavinZhao committed 2 years ago

Verified 82ca83db

rpc : set SO_REUSEADDR for the server socket (#7320)

rgerganov committed 2 years ago

Verified f4bd8b3d

Added a single test function script and fix debug-test.sh to be more robust (#7279)

mofosyne committed 2 years ago

Verified 51e9d025

py : convert-hf-to-gguf-update improvements (#7340)

akx committed 2 years ago

Verified d273c140

llama : use n_embd_head_v when reshaping kqv (#7327)

fairydreaming committed 2 years ago

Verified 27b04069

tokenization: add warning for double BOS (#7332)

JohannesGaessler committed 2 years ago

Verified 29c60d8c

ggml-quants, llama : removed excess checks (#7274)

GermanAizek committed 2 years ago

Verified 359cbe3f

convert : fix Qwen/Qwen-7b conversion (#7308)

amd-lalithnc committed 2 years ago

Verified e18bc6aa

server : add support for the RPC backend (#7305)

rgerganov committed 2 years ago

Verified ee94172d

ggml : rewrite silu and softmax for cpu (#7154)

jart committed 2 years ago

Verified 934266c0

[Server] Added --verbose option to README [no ci] (#7335)

reuank committed 2 years ago

Verified 9c4fdcbe

Revert "server bench: fix bench not waiting for model load (#7284)" (#7334)

phymbert committed 2 years ago

Verified 24ecb581

rpc : get available mem for the CPU backend

rgerganov committed 2 years ago

9afdffe7

rpc : add command line arg for specifying backend memory

rgerganov committed 2 years ago

3b3963c5

convert : get general.name from model dir, not its parent (#5615)

cebtenzzre committed 2 years ago

Verified dda64fc1

Older