whisper.cpp
sync : ggml
#2237
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
102
Changes
View On
GitHub
sync : ggml
#2237
ggerganov
merged 102 commits into
master
from
sync
ggml : add `ggml_upscale_ext` (ggml/814)
7bb8dabd
Add missing " (llama/7303)
215bcb35
ggml : tag ggml_tensor::backend as deprecated (llama/7290)
1c52d7f1
Avoid unnecessarily disabling CUDA graphs (llama/7302)
a32324ae
ggml : use dynamic thread scheduling for matrix multiplication (llama…
d3f4ab6d
Add support for properly optimized Windows ARM64 builds with LLVM and…
88b9d3b1
rpc : add command line arg for specifying backend memory
f1c281a2
ggml : rewrite silu and softmax for cpu (llama/7154)
831cf54f
ggml-quants, llama : removed excess checks (llama/7274)
b321ba36
rpc : set SO_REUSEADDR for the server socket (llama/7320)
d64e1334
CUDA: faster large batch FA without tensor cores (llama/7314)
4fea7a99
ggml : fix quants nans when all the group weights are very close to z…
653af39a
Update and fix Vulkan soft_max and argsort implementations (llama/7237)
449de6a4
cuda : add half2 __shfl_xor() for ROCm 5.5 (llama/7263)
280208ab
CUDA: deduplicate FlashAttention code (llama/7352)
e00ace49
android : use "ci-android" branch for CI (llama/7341)
e2118973
Capture CUDA logging output (llama/7298)
dfe6b642
cuda : clear error after buffer allocation failure (llama/7376)
0d54e789
ggml: implement quantized KV cache for FA (llama/7372)
570d7fdf
ggml : fix another case of quants nans (llama/7387)
acd5935c
Vulkan Embedding Fix (llama/7360)
9bbf65bc
Add provisions for windows support for BF16 code including CMake prov…
7db2a18a
ggml : add loongarch lsx and lasx support (llama/6454)
80e2b35c
ggml-opencl, llama: using reserve() if count already known (llama/7272)
85bbb069
Update SYCL upscale operation (llama/7321)
cc50ea05
rpc : track allocated buffers (llama/7411)
2668d573
CUDA: deduplicate mmq code (llama/7397)
ed7eb400
CUDA: fix unused warning in mmq.cu (llama/7442)
aa29372b
metal : handle F16 inf values, fix FA partial offload (llama/7434)
d2aa1cea
llama : add phi3 128K model support (llama/7225)
1ffabc8e
cuda : fix rope + add tests (llama/7452)
eca5fb84
CUDA: remove incorrect precision check (llama/7454)
4228fb7d
cuda : fix compile warning (llama/7454)
61d5a1e8
CUDA: fix FA out-of-bounds writes (llama/7465)
b08c0b09
CUDA: fix FA out-of-bounds reads (llama/7479)
f3665048
Update vulkan rope implementation to support frequency factors (llama…
a8f67b94
ggml : drop support for QK_K=64 (llama/7473)
c2be6503
ggml : remove ggml_flash_attn and ggml_flash_ff (llama/7463)
1470badf
ggml : silence UB sanitizer error during iq2_xxs quantization (llama/0)
22d4b17b
ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (llama…
024b58e4
ggml : restore ggml_rope_xpos_inplace (ggml/0)
e7b39d8f
metal : disable FA kernel for HS=256 (llama/7556)
e934ba58
metal : add GGML_OP_REPEAT kernels (llama/7557)
00559484
Add freq factors (llama/7495)
9b0dbe88
Fix q_xxs using mul_mat_q (llama/7459)
b725bb24
Allow multiple copy function pointers for CUDA graph kernel param upd…
b323cfcc
update HIP_UMA #7399 (llama/7414)
a1332066
ggml : generalize GGML_OP_CONCAT (llama/7563)
d6d25082
fix ggml_sycl_mul_mat_id() to match the change of api (llama/7436)
023020ce
rpc : resource management rework (llama/7562)
42a9c958
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE …
7cc2ff0f
sycl : fix assert (llama/7563)
9ff003fd
Align GEMM dispatch (llama/7566)
eeb929aa
ggml : fix typo in ggml.c (llama/7603)
f9df59a8
examples : adapt to new ggml_concat (ggml/0)
7e954209
ggml : use atomic_flag for critical section (llama/7598)
d53ab4b3
llama-bench : add support for the RPC backend (llama/7435)
78b74d50
cuda : non-cont concat support (llama/7610)
f5de5d74
ggml : fix YARN + add tests + add asserts (llama/7617)
fa6b9edf
metal : add missing asserts (llama/7617)
7382fecb
metal : remove invalid asserts (llama/7617)
e3e1a986
ggml : fix loongarch build (O2 issue) (llama/7636)
55de6e07
faster avx512 exp implementation (llama/7551)
79088fe6
ggml : fix loongson compile warnings (llama/7537)
b79eca73
CUDA: quantized KV support for FA vec (llama/7527)
49c5ccb9
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681)
5758ffa8
Fix FlashAttention debug test, FP32 assert (llama/7684)
bc6158dc
fix bug introduced in using calloc (llama/7701)
5f6620e9
kompute : implement op_getrows_f32 (llama/6403)
f8b7a7f5
Vulkan Mixture of Experts (MoE) support (llama/7628)
9e95aa1a
ggml : use OpenMP as a thread pool (llama/7606)
784733d7
llama : offload to RPC in addition to other backends (llama/7640)
0a6fd4e1
ggml : prevent builds with -ffinite-math-only (llama/7726)
1b34416f
ggml : remove OpenCL (llama/7735)
69982c7c
Allow number of nodes in CUDA graph to change (llama/7738)
bf0ff58b
ggml : refactor rope norm/neox (llama/7634)
809d0f49
CUDA: refactor mmq, dmmv, mmvq (llama/7716)
048f479c
fix softmax r2r result wrong issue (llama/7811)
c5f01ea1
vulkan : reuse parent extra for views (llama/7806)
e604adb2
CUDA: revise q8_1 data layout for mul_mat_q (llama/7824)
bb7a50f1
use the correct SYCL context for host USM allocations (llama/7777)
fa0b6928
CUDA: use tensor cores for MMQ (llama/7676)
b1991877
CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860)
28c0ccff
Update Vulkan RoPE implementation (llama/7818)
b30b2f42
vulkan: select only one device for single gpu with multiple drivers (…
bfb22129
ggml : improve ggml_is_contiguous logic (llama/7856)
035d6554
tests : add non-cont unary tests (llama/7857)
3544c186
CUDA: fix broken oob check for FA vec f32 kernel (llama/7904)
e8f4fa01
move BLAS to a separate backend (llama/6210)
ad6b8d58
rpc : fix ggml_backend_rpc_supports_buft() (llama/7918)
08078b96
metal : utilize max shared memory for mul_mat_id (llama/7935)
f8ac7b1c
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)
8abc2513
remove global variables (llama/7710)
8efd6d61
ggml : remove duplicate include of ggml-common.h (ggml/853)
d2744ccc
ggml : fix and optimize ppc64le (ggml/849)
ce33d6f1
sync : ggml
92dc0b78
cmake : fix CUDA build (#0)
b891050e
talk-llama : sync llama.cpp
16d44bd6
cuda : enable CUDA graphs (#0)
c711647a
sycl : sync (#0)
72523945
ggml : remove OpenCL (#0)
b51ff56d
cmake : fix sycl build (#0)
f5b667d5
ggerganov
merged
30841fa7
into master
1 year ago
ggerganov
deleted the sync branch
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub