llama.cpp
Loading models directly into VRAM, norm calculation on GPUs, broadcasting for ggml_mul
#1483
Merged

Loading models directly into VRAM, norm calculation on GPUs, broadcasting for ggml_mul #1483

JohannesGaessler
JohannesGaessler JohannesGaessler requested a review from slaren slaren 2 years ago
JohannesGaessler JohannesGaessler added enhancement
slaren
JohannesGaessler
ggerganov
JohannesGaessler
JohannesGaessler
JohannesGaessler JohannesGaessler force pushed from af005ce3 2 years ago
JohannesGaessler
Green-Sky
JohannesGaessler
JohannesGaessler JohannesGaessler force pushed to a272e71d 2 years ago
JohannesGaessler JohannesGaessler changed the title Norm calculation on GPUs, broadcasting for ggml_mul Loading models directly into VRAM, norm calculation on GPUs, broadcasting for ggml_mul 2 years ago
JohannesGaessler
JohannesGaessler JohannesGaessler marked this pull request as draft 2 years ago
JohannesGaessler
ggerganov
JohannesGaessler
JohannesGaessler
slaren
Green-Sky
JohannesGaessler
Green-Sky
JohannesGaessler
JohannesGaessler JohannesGaessler force pushed from 9acc42f8 2 years ago
JohannesGaessler
Green-Sky
rankaiyx
JohannesGaessler
JohannesGaessler Broadcasting for ggml_mul
de65783b
JohannesGaessler CUDA kernel for ggml_mul, norms in VRAM
2365a2a9
JohannesGaessler GPU weights not in RAM, direct loading with cuFile
fa1a29f3
JohannesGaessler JohannesGaessler force pushed to fa1a29f3 2 years ago
JohannesGaessler
slaren
slaren commented on 2023-05-18
slaren
slaren commented on 2023-05-18
slaren
slaren commented on 2023-05-18
JohannesGaessler fixup! GPU weights not in RAM, direct loading with cuFile
1bfe5a98
JohannesGaessler
JohannesGaessler
rankaiyx
JohannesGaessler
JohannesGaessler fixup! GPU weights not in RAM, direct loading with cuFile
24d5ddf6
JohannesGaessler JohannesGaessler marked this pull request as ready for review 2 years ago
ott2 define default model path once, sync path with readme (#1366)
09d82511
ilyakurdyukov ~7% faster Q5_1 AVX2 code (#1477)
230018d1
TheBloke convert.py: Support models which are stored in a single pytorch_model…
1af2844e
benchmark-matmul: Print the average of the test results (#1490)
d5207bf3
sw Remove unused n_parts parameter (#1509)
d916c5b8
DannyDaemonic Fixes #1511 lambda issue for w64devkit (mingw) (#1513)
a94b3345
Green-Sky make kv_f16 the default for api users (#1517)
e22541a4
ggerganov minor : fix compile warnings
6b5776b0
dakennedyd readme : adds WizardLM to the list of supported models (#1485)
75c017fc
data-angel main : make reverse prompt option act as a stop token in non-interact…
c51c64a8
ejones examples : add persistent chat (#1495)
0226d491
ggerganov tests : add missing header
9fd81872
ggerganov ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)
211aa6af
ggerganov ggml : fix scalar implementation of Q4_1 dot
9a7af6c2
ggerganov llama : fix compile warnings in llama_set_state_data()
f14673ad
maximegmd llama : fix name shadowing and C4146 (#1526)
df512bbb
DannyDaemonic Fix for mingw (#1462)
f401d5ff
ggerganov llama : add llama_init_backend() API (close #1527)
54ec8a96
zenixls2 feature : add blis and other BLAS implementation support (#1502)
667c57f1
ggerganov Revert "feature : add blis and other BLAS implementation support (#15…
977e74d7
JohannesGaessler GPU weights not in RAM, direct loading with cuFile
ffe9652b
ggerganov llama : code style fixes + progress print fix
f67bc3c3
ggerganov ggml : ggml_mul better broadcast support
3ec7941b
ggerganov cmake : workarounds for cufile when CMake version < 3.25
a3586c52
ggerganov Merge branch 'master' into gpu-norms
909acb3e
github-actions
github-actions commented on 2023-05-20
ggerganov
ggerganov requested changes on 2023-05-20
JohannesGaessler gg rebase fixup
fee87f65
JohannesGaessler
github-actions
github-actions commented on 2023-05-20
JohannesGaessler Loop in llama.cpp, fixed progress callback
b81f662e
github-actions
github-actions commented on 2023-05-20
JohannesGaessler
JohannesGaessler Attempt clang-tidy fix
fadcd583
ggerganov llama : fix vram size computation
a4da072d
ggerganov
ggerganov approved these changes on 2023-05-20
JohannesGaessler Add forgotten fclose()
37f2c6c2
ggerganov ggerganov merged affc76ed into master 2 years ago
JohannesGaessler

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone