llama.cpp
Loading models directly into VRAM, norm calculation on GPUs, broadcasting for ggml_mul
#1483
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
35
Changes
View On
GitHub
Loading models directly into VRAM, norm calculation on GPUs, broadcasting for ggml_mul
#1483
ggerganov
merged 35 commits into
ggml-org:master
from
JohannesGaessler:gpu-norms
JohannesGaessler
requested a review
from
slaren
2 years ago
JohannesGaessler
added
enhancement
JohannesGaessler
force pushed
from
af005ce3
2 years ago
JohannesGaessler
force pushed
to
a272e71d
2 years ago
JohannesGaessler
changed the title
Norm calculation on GPUs, broadcasting for ggml_mul
Loading models directly into VRAM, norm calculation on GPUs, broadcasting for ggml_mul
2 years ago
JohannesGaessler
marked this pull request as draft
2 years ago
JohannesGaessler
force pushed
from
9acc42f8
2 years ago
Broadcasting for ggml_mul
de65783b
CUDA kernel for ggml_mul, norms in VRAM
2365a2a9
GPU weights not in RAM, direct loading with cuFile
fa1a29f3
JohannesGaessler
force pushed
to
fa1a29f3
2 years ago
slaren
commented on 2023-05-18
slaren
commented on 2023-05-18
slaren
commented on 2023-05-18
fixup! GPU weights not in RAM, direct loading with cuFile
1bfe5a98
fixup! GPU weights not in RAM, direct loading with cuFile
24d5ddf6
JohannesGaessler
marked this pull request as ready for review
2 years ago
define default model path once, sync path with readme (#1366)
09d82511
~7% faster Q5_1 AVX2 code (#1477)
230018d1
convert.py: Support models which are stored in a single pytorch_model…
1af2844e
benchmark-matmul: Print the average of the test results (#1490)
d5207bf3
Remove unused n_parts parameter (#1509)
d916c5b8
Fixes #1511 lambda issue for w64devkit (mingw) (#1513)
a94b3345
make kv_f16 the default for api users (#1517)
e22541a4
minor : fix compile warnings
6b5776b0
readme : adds WizardLM to the list of supported models (#1485)
75c017fc
main : make reverse prompt option act as a stop token in non-interact…
c51c64a8
examples : add persistent chat (#1495)
0226d491
tests : add missing header
9fd81872
ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)
211aa6af
ggml : fix scalar implementation of Q4_1 dot
9a7af6c2
llama : fix compile warnings in llama_set_state_data()
f14673ad
llama : fix name shadowing and C4146 (#1526)
df512bbb
Fix for mingw (#1462)
f401d5ff
llama : add llama_init_backend() API (close #1527)
54ec8a96
feature : add blis and other BLAS implementation support (#1502)
667c57f1
Revert "feature : add blis and other BLAS implementation support (#15…
977e74d7
GPU weights not in RAM, direct loading with cuFile
ffe9652b
llama : code style fixes + progress print fix
f67bc3c3
ggml : ggml_mul better broadcast support
3ec7941b
cmake : workarounds for cufile when CMake version < 3.25
a3586c52
Merge branch 'master' into gpu-norms
909acb3e
github-actions
commented on 2023-05-20
ggerganov
requested changes on 2023-05-20
gg rebase fixup
fee87f65
github-actions
commented on 2023-05-20
Loop in llama.cpp, fixed progress callback
b81f662e
github-actions
commented on 2023-05-20
Attempt clang-tidy fix
fadcd583
llama : fix vram size computation
a4da072d
ggerganov
approved these changes on 2023-05-20
Add forgotten fclose()
37f2c6c2
ggerganov
merged
affc76ed
into master
2 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
ggerganov
github-actions
slaren
Assignees
No one assigned
Labels
enhancement
Milestone
No milestone
Login to write a write a comment.
Login via GitHub