Commits
  • HIP: add fattn-mma-f16 for RDNA4 (llama/18481)
    ggerganov committed 47 days ago
  • ggml-metal: do not copy headers for embedded, use current binary dir for embedded (llama/18705)
    ggerganov committed 47 days ago
  • vulkan: work around Intel fp16 bug in mmq (llama/18814)
    ggerganov committed 47 days ago
  • CUDA : fix typo in clang pragma comment [no ci] (llama/18830)
    ggerganov committed 47 days ago
  • vulkan: Check maxStorageBufferRange in supports_op (llama/18709)
    ggerganov committed 47 days ago
  • CUDA: Factor out and re-use `block_reduce` function (llama/18785)
    ggerganov committed 47 days ago
  • hexagon: support for OP_CPY, host buffers now optional (llama/18822)
    ggerganov committed 47 days ago
  • ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (llama/18837)
    ggerganov committed 47 days ago
  • CUDA: fix allignment on register spill for FA (llama/18815)
    ggerganov committed 47 days ago
  • cuda : print less debug logs when disabling cuda graphs (llama/18868)
    ggerganov committed 47 days ago
  • OpenCL: add SOLVE_TRI op support (llama/18846)
    ggerganov committed 47 days ago
  • CANN: support gated linear attn (llama/18653)
    ggerganov committed 47 days ago
  • CANN: fix an issue where get_env was not fully renamed (llama/18796)
    ggerganov committed 47 days ago
  • CANN: Remove unused `ggml_cann_get_device` function (llama/18625)
    ggerganov committed 47 days ago
  • ggml-blas: hide warnings from included BLAS headers (llama/18818)
    ggerganov committed 47 days ago
  • ggml : extend ggml_pool_1d + metal (llama/16429)
    ggerganov committed 47 days ago
  • ggml webgpu: support for backend sampling (llama/18880)
    ggerganov committed 47 days ago
  • opencl: fix q6_K mv for m=1 (llama/18893)
    ggerganov committed 47 days ago
  • ggml : add ggml_build_forward_select (llama/18550)
    ggerganov committed 47 days ago
  • metal : enable FA for MLA heads (llama/18950)
    ggerganov committed 47 days ago
  • ggml : cleanup path_str() (llama/18928)
    ggerganov committed 47 days ago
  • CUDA: Replace init_offsets kernel with iterators in cub-based argsort (llama/18930)
    ggerganov committed 47 days ago
  • CUDA: Fix builds for older CCCL versions by ifdefing strided_iterator (llama/18964)
    ggerganov committed 47 days ago
  • vulkan: Use mul_mat_vec_id for small values of n (llama/18918)
    ggerganov committed 47 days ago
  • Revert "vulkan: force full subgroups for flash attention to fix intel subgroup crash (#17356)" (llama/18831)
    ggerganov committed 47 days ago
  • vulkan: support flash attention GQA/split_k with small batches (llama/18938)
    ggerganov committed 47 days ago
  • vulkan: Remove transfer_ctx, do everything in compute_ctx. (llama/18945)
    ggerganov committed 47 days ago
  • ggml-zdnn : mark zDNN buffers as non-host (llama/18967)
    ggerganov committed 47 days ago
  • opencl: add TRI op support (llama/18979)
    ggerganov committed 47 days ago
  • CUDA: add gqa_ratio 4 for GLM 4.7 flash (llama/18953)
    ggerganov committed 47 days ago
  • opencl: enable the general fp mm for non-cont input and as a fallback for specialized kqv kernel for adreno (llama/18970)
    ggerganov committed 47 days ago
  • CUDA: fix alignment check for FA (llama/19023)
    ggerganov committed 47 days ago
  • mla : make the V tensor a view of K (llama/18986)
    ggerganov committed 47 days ago
  • ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (llama/18860)
    ggerganov committed 47 days ago
  • use malloc to support both iGPU and dGPU in same time (llama/18992)
    ggerganov committed 47 days ago
  • ggml-hexagon: flash-attn opt (llama/19025)
    ggerganov committed 47 days ago
  • ggml-cuda: enable cuda-graphs for `n-cpu-moe` (llama/18934)
    ggerganov committed 47 days ago
  • CUDA: re-use MLA K data for V in MMA FA (llama/19057)
    ggerganov committed 47 days ago
  • kv-cache : support V-less cache (llama/19067)
    ggerganov committed 47 days ago
  • ggml-cpu: Use tiled FA for prompt-processing (llama/19012)
    ggerganov committed 47 days ago
  • metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/macOS (llama/19088)
    ggerganov committed 47 days ago
  • CUDA: faster FA for GQA > 1 but not power of 2 (llama/19092)
    ggerganov committed 47 days ago
  • CUDA: fix padding of GQA to power of 2 in FA (llama/19115)
    ggerganov committed 47 days ago
  • opencl: add flattened q6_K mv (llama/19054)
    ggerganov committed 47 days ago
  • ggml-cpu: Enable FP16 MMA kernels on PPC (llama/19060)
    ggerganov committed 47 days ago
  • Reduce CPU-side stalls due to the CUDA command buffer being full (llama/19042)
    ggerganov committed 47 days ago
  • ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (llama/18888)
    ggerganov committed 47 days ago
  • CUDA: tune GLM 4.7 Flash FA kernel selection logic (llama/19097)
    ggerganov committed 47 days ago
  • ggml-zendnn : update ZenDNN git tag to main branch (llama/19133)
    ggerganov committed 47 days ago
  • ggml webgpu: Split shared state (webgpu_context) into global state and per-thread state (llama/18976)
    ggerganov committed 47 days ago
  • CUDA: tune GLM 4.7 Flash FA kernel selection logic (DGX Spark) (llama/19142)
    ggerganov committed 47 days ago
  • cuda : fix "V is K view" check for non-unified KV cache (llama/19145)
    ggerganov committed 47 days ago
  • ggml-cpu: arm64: Q4_K scale unroll and vectorization (llama/19108)
    ggerganov committed 47 days ago
  • ggml: new backend for Virglrenderer API Remoting acceleration (v2) (llama/18718)
    ggerganov committed 47 days ago
  • vulkan: handle device dedup on MacOS + Vega II Duo cards (llama/19058)
    ggerganov committed 47 days ago
  • ggml-sycl: remove unused syclcompat header (llama/19140)
    ggerganov committed 47 days ago
  • Vulkan Flash Attention Coopmat1 Refactor (llama/19075)
    ggerganov committed 47 days ago
  • sycl: fix norm kernels: l2_norm, group_norm, rms_norm by remove assert to support more cases (llama/19154)
    ggerganov committed 47 days ago
  • CUDA: refactor topk-moe to enable more models (GLM 4.7, Nemotron etc.) (llama/19126)
    ggerganov committed 47 days ago
  • ggml-zendnn : resolve ZenDNN backend cross-module symbol dependency (llama/19159)
    ggerganov committed 47 days ago
  • HIP: add mmf for CDNA (llama/18896)
    ggerganov committed 47 days ago
  • cuda : fix nkvo, offload and cuda graph node properties matching (llama/19165)
    ggerganov committed 47 days ago
  • hexagon: enable offloading to Hexagon on Windows on Snapdragon (llama/19150)
    ggerganov committed 47 days ago
  • ggml-webgpu: improve flastAttention performance by software pipelining (llama/19151)
    ggerganov committed 47 days ago
  • sycl: implement GGML_OP_TRI (llama/19089)
    ggerganov committed 47 days ago
  • sycl: implement GGML_UNARY_OP_SOFTPLUS (llama/19114)
    ggerganov committed 47 days ago
  • add tensor type checking as part of cuda graph properties (llama/19186)
    ggerganov committed 47 days ago
  • sync : ggml
    ggerganov committed 47 days ago
  • talk-llama : sync llama.cpp
    ggerganov committed 47 days ago
  • cuda : fix compile warnings (#0)
    ggerganov committed 47 days ago
Loading