PR #2237 sync : ggml - SemanticDiff

ggml : add `ggml_upscale_ext` (ggml/814)

7bb8dabd

Add missing " (llama/7303)

215bcb35

ggml : tag ggml_tensor::backend as deprecated (llama/7290)

1c52d7f1

Avoid unnecessarily disabling CUDA graphs (llama/7302)

a32324ae

ggml : use dynamic thread scheduling for matrix multiplication (llama…

d3f4ab6d

Add support for properly optimized Windows ARM64 builds with LLVM and…

88b9d3b1

rpc : add command line arg for specifying backend memory

f1c281a2

ggml : rewrite silu and softmax for cpu (llama/7154)

831cf54f

ggml-quants, llama : removed excess checks (llama/7274)

b321ba36

rpc : set SO_REUSEADDR for the server socket (llama/7320)

d64e1334

CUDA: faster large batch FA without tensor cores (llama/7314)

4fea7a99

ggml : fix quants nans when all the group weights are very close to z…

653af39a

Update and fix Vulkan soft_max and argsort implementations (llama/7237)

449de6a4

cuda : add half2 __shfl_xor() for ROCm 5.5 (llama/7263)

280208ab

CUDA: deduplicate FlashAttention code (llama/7352)

e00ace49

android : use "ci-android" branch for CI (llama/7341)

e2118973

Capture CUDA logging output (llama/7298)

dfe6b642

cuda : clear error after buffer allocation failure (llama/7376)

0d54e789

ggml: implement quantized KV cache for FA (llama/7372)

570d7fdf

ggml : fix another case of quants nans (llama/7387)

acd5935c

Vulkan Embedding Fix (llama/7360)

9bbf65bc

Add provisions for windows support for BF16 code including CMake prov…

7db2a18a

ggml : add loongarch lsx and lasx support (llama/6454)

80e2b35c

ggml-opencl, llama: using reserve() if count already known (llama/7272)

85bbb069

Update SYCL upscale operation (llama/7321)

cc50ea05

rpc : track allocated buffers (llama/7411)

2668d573

CUDA: deduplicate mmq code (llama/7397)

ed7eb400

CUDA: fix unused warning in mmq.cu (llama/7442)

aa29372b

metal : handle F16 inf values, fix FA partial offload (llama/7434)

d2aa1cea

llama : add phi3 128K model support (llama/7225)

1ffabc8e

cuda : fix rope + add tests (llama/7452)

eca5fb84

CUDA: remove incorrect precision check (llama/7454)

4228fb7d

cuda : fix compile warning (llama/7454)

61d5a1e8

CUDA: fix FA out-of-bounds writes (llama/7465)

b08c0b09

CUDA: fix FA out-of-bounds reads (llama/7479)

f3665048

Update vulkan rope implementation to support frequency factors (llama…

a8f67b94

ggml : drop support for QK_K=64 (llama/7473)

c2be6503

ggml : remove ggml_flash_attn and ggml_flash_ff (llama/7463)

1470badf

ggml : silence UB sanitizer error during iq2_xxs quantization (llama/0)

22d4b17b

ggml: aarch64: SVE kernels for q8_0_q8_0, q4_0_q8_0 vector dot (llama…

024b58e4

ggml : restore ggml_rope_xpos_inplace (ggml/0)

e7b39d8f

metal : disable FA kernel for HS=256 (llama/7556)

e934ba58

metal : add GGML_OP_REPEAT kernels (llama/7557)

00559484

Add freq factors (llama/7495)

9b0dbe88

Fix q_xxs using mul_mat_q (llama/7459)

b725bb24

Allow multiple copy function pointers for CUDA graph kernel param upd…

b323cfcc

update HIP_UMA #7399 (llama/7414)

a1332066

ggml : generalize GGML_OP_CONCAT (llama/7563)

d6d25082

fix ggml_sycl_mul_mat_id() to match the change of api (llama/7436)

023020ce

rpc : resource management rework (llama/7562)

42a9c958

vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE …

7cc2ff0f

sycl : fix assert (llama/7563)

9ff003fd

Align GEMM dispatch (llama/7566)

eeb929aa

ggml : fix typo in ggml.c (llama/7603)

f9df59a8

examples : adapt to new ggml_concat (ggml/0)

7e954209

ggml : use atomic_flag for critical section (llama/7598)

d53ab4b3

llama-bench : add support for the RPC backend (llama/7435)

78b74d50

cuda : non-cont concat support (llama/7610)

f5de5d74

ggml : fix YARN + add tests + add asserts (llama/7617)

fa6b9edf

metal : add missing asserts (llama/7617)

7382fecb

metal : remove invalid asserts (llama/7617)

e3e1a986

ggml : fix loongarch build (O2 issue) (llama/7636)

55de6e07

faster avx512 exp implementation (llama/7551)

79088fe6

ggml : fix loongson compile warnings (llama/7537)

b79eca73

CUDA: quantized KV support for FA vec (llama/7527)

49c5ccb9

CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (llama/7681)

5758ffa8

Fix FlashAttention debug test, FP32 assert (llama/7684)

bc6158dc

fix bug introduced in using calloc (llama/7701)

5f6620e9

kompute : implement op_getrows_f32 (llama/6403)

f8b7a7f5

Vulkan Mixture of Experts (MoE) support (llama/7628)

9e95aa1a

ggml : use OpenMP as a thread pool (llama/7606)

784733d7

llama : offload to RPC in addition to other backends (llama/7640)

0a6fd4e1

ggml : prevent builds with -ffinite-math-only (llama/7726)

1b34416f

ggml : remove OpenCL (llama/7735)

69982c7c

Allow number of nodes in CUDA graph to change (llama/7738)

bf0ff58b

ggml : refactor rope norm/neox (llama/7634)

809d0f49

CUDA: refactor mmq, dmmv, mmvq (llama/7716)

048f479c

fix softmax r2r result wrong issue (llama/7811)

c5f01ea1

vulkan : reuse parent extra for views (llama/7806)

e604adb2

CUDA: revise q8_1 data layout for mul_mat_q (llama/7824)

bb7a50f1

use the correct SYCL context for host USM allocations (llama/7777)

fa0b6928

CUDA: use tensor cores for MMQ (llama/7676)

b1991877

CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860)

28c0ccff

Update Vulkan RoPE implementation (llama/7818)

b30b2f42

vulkan: select only one device for single gpu with multiple drivers (…

bfb22129

ggml : improve ggml_is_contiguous logic (llama/7856)

035d6554

tests : add non-cont unary tests (llama/7857)

3544c186

CUDA: fix broken oob check for FA vec f32 kernel (llama/7904)

e8f4fa01

move BLAS to a separate backend (llama/6210)

ad6b8d58

rpc : fix ggml_backend_rpc_supports_buft() (llama/7918)

08078b96

metal : utilize max shared memory for mul_mat_id (llama/7935)

f8ac7b1c

CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)

8abc2513

remove global variables (llama/7710)

8efd6d61

ggml : remove duplicate include of ggml-common.h (ggml/853)

d2744ccc

ggml : fix and optimize ppc64le (ggml/849)

ce33d6f1

sync : ggml

92dc0b78

cmake : fix CUDA build (#0)

b891050e

talk-llama : sync llama.cpp

16d44bd6

cuda : enable CUDA graphs (#0)

c711647a

sycl : sync (#0)

72523945

ggml : remove OpenCL (#0)

b51ff56d

cmake : fix sycl build (#0)

f5b667d5

ggerganov merged 30841fa7 into master 1 year ago

ggerganov deleted the sync branch 1 year ago

whisper.cpp
sync : ggml
#2237

Merged

sync : ggml #2237

whisper.cpp sync : ggml #2237 Merged

sync : ggml #2237

whisper.cpp
sync : ggml
#2237

Merged