llama.cpp
sampling : add support for backend sampling
#17004
Merged

sampling : add support for backend sampling #17004

ggerganov merged 179 commits into ggml-org:master from danbev:gpu-sampling
danbev
github-actions github-actions added testing
am17an
danbev danbev force pushed 73 days ago
danbev danbev force pushed 72 days ago
github-actions github-actions added Nvidia GPU
github-actions github-actions added ggml
danbev danbev force pushed 72 days ago
danbev danbev force pushed 72 days ago
danbev danbev force pushed 71 days ago
github-actions github-actions added examples
github-actions github-actions added server
danbev danbev force pushed 70 days ago
danbev danbev force pushed 69 days ago
danbev danbev force pushed 68 days ago
danbev danbev force pushed 67 days ago
danbev danbev force pushed 67 days ago
danbev danbev force pushed 67 days ago
danbev danbev force pushed 67 days ago
slaren
slaren commented on 2025-11-11
danbev danbev force pushed 66 days ago
danbev danbev force pushed 66 days ago
danbev danbev force pushed 66 days ago
danbev danbev force pushed 66 days ago
ORippler
ORippler commented on 2025-11-12
danbev danbev force pushed 65 days ago
danbev
ggerganov
danbev danbev force pushed 64 days ago
danbev danbev force pushed 62 days ago
danbev danbev force pushed 62 days ago
danbev danbev force pushed 62 days ago
danbev danbev force pushed 62 days ago
github-actions github-actions added Apple Metal
danbev danbev force pushed 61 days ago
ggerganov
ggerganov commented on 2025-11-17
ggerganov
ggerganov commented on 2025-11-17
danbev danbev force pushed 61 days ago
danbev
danbev danbev force pushed 61 days ago
danbev danbev changed the title sampling : add support for GPU sampling (wip) sampling : add support for backend sampling (wip) 61 days ago
ggerganov
ggerganov
ggerganov commented on 2025-11-17
danbev sampling : add support for backend sampling
7884b0e0
danbev llama-cli : add backend sampler configuration
9fe9a00a
danbev server : add backend sampling options/configuration
f1f3e685
danbev webui : add backend sampling options
a3eb847d
danbev ggml : add initial cumsum implementation for CUDA
67d3b8e8
danbev danbev force pushed to 67d3b8e8 60 days ago
danbev danbev changed the title sampling : add support for backend sampling (wip) sampling : add support for backend sampling 60 days ago
danbev sampling : enable all backend sampler tests
71574f92
danbev danbev marked this pull request as ready for review 60 days ago
danbev danbev requested a review from allozaur allozaur 60 days ago
danbev danbev requested a review from ngxson ngxson 60 days ago
danbev danbev requested a review from CISC CISC 60 days ago
ggerganov graph : do not include llama-model.h
4b52e599
ggerganov
ggerganov commented on 2025-11-18
danbev sampling : always expose sampled_ids
82957a90
danbev sampling : ensure at most one output token per seq
311c1a34
ORippler CUDA: Optimize argsort for gpu-based token sampling
26be108b
ORippler
danbev sampling : remove version from sampler chain
0da7e7dc
danbev sampling : always populate logits for sampled probs
51fee298
danbev sampling : simplify backend sampling logic decode
7e98ebcc
ggerganov
danbev
danbev squash! sampling : simplify backend sampling logic decode
d74eb61a
ORippler
ggerganov common : fix regression caused by extra memory allocations during sam…
38f408c2
danbev squash! sampling : simplify backend sampling logic decode
18ed4d8f
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
0c660e73
danbev squash! common : fix regression caused by extra memory allocations du…
ed4345bd
ORippler
ORippler commented on 2025-11-20
danbev sampling : introduce sampling_info struct
0d28b16b
danbev sampling : return early if backend sampling is disabled
c1625620
danbev sampling : use pinned memory for backend sampling buffers
61ffe41d
danbev common, tools : refactor model loading to support backend samplers
9b243934
danbev
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
79b8cf2a
danbev sampling : add stride variable for clarity
65500d05
danbev sampling: clarify candidate ids usage in comments
ae23d2d2
danbev sampling : fix copying both sampled tokens and logits/probs from backend
9e273f7a
danbev tests : cleanup test-backend-sampler.cpp
50d21aa4
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
7816f0bb
danbev common : remove build-info.cpp from commit [no ci]
d88ba181
danbev sampling : cleanup and clarify output_reserve
4a90583d
danbev sampling : remove redundant checks for stride and size [no ci]
8eb9b476
danbev sampling : add debug log when backend sampler selects token
25f33806
danbev examples : update batched to use backend sampling
d0bea21a
ggerganov llama-cli : fix dangling reference to sampler config
e2d4f082
ggerganov common : initialize backend samplers
b26c7069
ggerganov samplers : add missing cont
883a8704
danbev sampling : add assertions for contiguous tensors in async copy functions
a02adf42
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
2b4c7927
danbev examples : add info about hybrid sampling in batched [no ci]
0f17ccde
danbev Merge remote-tracking branch 'upstream/master' into gpu-sampling
53dca56d
ggerganov
ggerganov commented on 2025-11-25
danbev sampling : remove backend-dist option (wip)
9e5e09d0
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
ec047e12
ORippler CUDA: Add top-k implementation
f23b306c
danbev sampling : add min-p backend sampler
b45d504e
github-actions github-actions added build
ORippler Use `FetchContent` over CPM as it's bundled with CMake
4fea191c
danbev common : add get_active_samplers function to check enabled samplers
0f7805f3
ORippler
ORippler commented on 2025-11-26
danbev cuda : fix editorconfig-checker warning
90a3aff2
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
7c2bfb35
danbev sampling : use argmax for min-p sampling
d9d73610
danbev sampling : fix temperature check to allow zero temperature
51107a0b
danbev cuda : fix top-k compilation when CUB is unavailable
5ea3be26
danbev sampling : add comments about backend sampler [no ci]
172208af
danbev sampling : remove backend sampling chain from common_sampler
e9d07098
ORippler Fix top-k comp & behavior for non-CUB path
f9889cf1
danbev sampling : support intermixed backend/cpu samplers
74be332e
danbev squash! sampling : support intermixed backend/cpu samplers
9ad6522b
danbev squash! sampling : support intermixed backend/cpu samplers
459b7ae7
ggerganov refactor : simplify and improve memory management
117e2079
ggerganov ggerganov requested a review from JohannesGaessler JohannesGaessler 49 days ago
ORippler Add initial version for top-p sampling
333da805
ORippler
ORippler commented on 2025-11-28
danbev sampling : use logits directly for min-p filtering
8cac9dee
ggerganov sampling : simplify
2464d1b3
ggerganov llama : simplify
fbc8f49f
ggerganov llama : cleanup + naming
9028ebfe
ggerganov Merge branch 'master' into HEAD
d8d98bb4
ggerganov llama : call backend_init once
ff7b0bf6
ggerganov Merge branch 'master' into HEAD
467746e3
ggerganov llama : reserve graphs with samplers
1760bd69
ggerganov llama : naming
c187003d
ggerganov cont : naming
80742cba
danbev sampling : lower log level for output buffer reallocations [no ci]
cf0e1475
ORippler Fix backend_top_p_sampler
8bee483c
ggerganov Merge branch 'master' into HEAD
16451d6b
ORippler Factor out `ggml_sort` into its own function
ae0bb6a6
ORippler Make backend's top_p sampler inclusive
217469f0
ggerganov common : simplify sampler chain initialization
4032ce23
ggerganov sampling : do not create empty samplers
04f2822a
ggerganov sampling : fix top_p empty condition
88cca45b
ggerganov
danbev examples : remove outdated backend sampling section
988261b1
danbev sampling : fix backend temp sampler for zero temperature
739b5978
danbev Merge remote-tracking branch 'upstream/master' into gpu-sampling
3e9a258c
ggerganov
ggerganov commented on 2025-12-02
ORippler CUDA: Move cccl fetch to after cuda has been enabled in CMakeLists.txt
559d058d
ORippler CUDA: Use standard-compliant preprocessor for MSVC builds
244880ae
ORippler CUDA: Update CCCL's rc candidate
516af33c
danbev squash! sampling : fix backend temp sampler for zero temperature
db8972e2
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
2595818a
danbev sampling : implement temp_ext_backend sampling
aad5a6af
ggerganov sampling : minor cleanup
cce3b2a8
danbev sampling : stop short if backend sampler sampled a token
87b2719e
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
c0b182f4
danbev Revert "sampling : stop short if backend sampler sampled a token"
10bd640a
danbev sampling : fix backend temp sampling to use logits masking
ac9e1647
ggerganov sampling : simplify temp sampling
fce571ee
ggerganov sampling : remove redundant calls to ggml_build_forward_expand
1bde7078
ggerganov sampling : check backend support during init
6958d413
ggerganov cont : keep backend sampling disabled for now
abc19635
ggerganov sampling : fix outputs and device checks
7864074f
allozaur
allozaur approved these changes on 2025-12-05
ggerganov sampling : fix candidates logic
cf74b1a8
ORippler Add perf-tests for CUMSUM
dd11f6eb
ORippler Merge branch 'master' into gpu-sampling
76689995
ORippler Readd `cub::DeviceScan::InclusiveSum`-based CumSum
e6525661
ggerganov sampling : expand support (wip)
30742a6f
ggerganov Merge branch 'master' into HEAD
fdac9686
ggerganov tests : fix memory leaks
52258181
github-actions github-actions added python
ggerganov cont : fixes
8ef5f900
ggerganov tests : check temp back to 0.0
42125f0e
ggerganov sampling : fix top-p
72e36810
ggerganov Merge branch 'master' into HEAD
6d38db5d
ggerganov sampling : handle n_probs case
f3beb22b
ggerganov server : handle unsupported cases
560ac16f
ggerganov metal : print node names for debugging
d62b5804
ggerganov ggml : remove redundant src in ggml_cast
62d1b008
ggerganov ggml-alloc : fix reuse-parent logic for misaligned sizes
9f6681c3
jacekpoplawski
ggerganov Revert "ggml : remove redundant src in ggml_cast"
7ab6f51b
ORippler CUDA: Add Cooperative-Groups-based parallelization of ncols in softmax
a84dfd3e
ORippler Add TODOs to and adjust heuristics of row-wise soft_max in CUDA
886c3668
ORippler Fix compiler warnings by casting `const` away
07003f1f
ORippler
ggerganov llama : require backend samplers to be of type llama_sampler_chain
92ff7679
jeffbolznv
ggerganov sampling : use host buffer type for inputs
34b407b4
ORippler Try fixing HIP build errors by adding corresponding #defines
3f0594ad
ORippler Fix launch logic when supports_cooperative_launch=false
a25fda52
ORippler Disable cooperative groups for musa
6dc6614b
ORippler
ggerganov Merge branch 'master' into HEAD
81cb5783
ggerganov server : reconnect the backend_sampling setting in the WebUI
0ecee8be
ggerganov graph : make the compute graph constant with respect to active samplers
c02654eb
ggerganov Merge branch 'master' into HEAD
38882247
JohannesGaessler
JohannesGaessler commented on 2025-12-10
ggerganov batch : fix sequence id ownage
44d5c4b5
ggerganov graph : respect sampler order for graph reuse
804e7e37
JohannesGaessler HIP/MUSA: fix build for backend sampling
42cf5c01
JohannesGaessler
JohannesGaessler
danbev Merge pull request #1 from JohannesGaessler/gpu-sampling-hip
56720f8f
ggerganov
ggerganov sampling : optimize logit_bias sampler
54e90540
ggerganov cont : fix build
d5d16651
ggerganov sampling : generic ggml op support detection
8544aba3
ggerganov sampling : fix greedy
74b112e3
ggerganov tests : run backend sampler tests always on the CPU
ab65b47a
ggerganov Merge branch 'master' into HEAD
4d10b78e
ORippler
ORippler
JohannesGaessler
ggerganov
JohannesGaessler
ggerganov
ORippler
ORippler Apply suggestions from code review
07b809bb
ggerganov Merge branch 'master' into HEAD
22c7f85b
ggerganov Merge branch 'master' into HEAD
0086c246
ggerganov webui : fix lint
2652e745
ORippler Fix data-race in `soft_max_f32_parallelize_cols_single_row`
3732b85b
ORippler Apply automated code-formating to softmax.cu
e5737f66
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
ad1b60ab
danbev llama : clarify backend_accept/backend_set_input comments [no ci]
68a1c4dc
danbev llama : fix typo in comment [no ci]
c5d44b85
danbev tests : use smart pointers for backend samplers
9a9ea2f6
danbev tests : use smart pointers for model and context
98459969
danbev tests : remove vocab member from test_model_context
76a1b7fe
danbev tests : extract batch info update to separate method
cc31e6a2
danbev tests : fix batch token position tracking in test_backend_sampler.cpp
a519aea3
danbev tests : add --device option support to backend sampler tests
981475fe
ggerganov Merge branch 'master' into HEAD
eefdb0da
ggerganov common : disable backend sampling when grammar is involved
3b3f5fed
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
bc5195c5
ORippler Fix different RNG-states between backend-sampling and llama-sampling
17509174
ORippler Make backend dist sampler use same rnd's as dist sampler
0a17687c
ORippler
ORippler Update CCCL version to v3.2.0-rc2
b5ec0fd7
ORippler Build with CCCL 3.2 for CUDA backends
1da013c6
github-actions github-actions added devops
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
f1310ab9
ggerganov Merge branch 'master' into HEAD
0ce03597
ggerganov tests : revert server test changes (no longer needed)
c0a351cc
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
82c26005
danbev ggml : include cub/cub.cuh instead of block_scan.cuh
060c0a58
danbev Merge remote-tracking branch 'upstream/master' into backend-sampling
ebfe545c
ggerganov arg : add shorthand for --backend-sampling
23e8bb40
ggerganov ci : add server workflow with backend sampling
5d2156e8
ggerganov sampling : fix reshapes
610e50a1
ggerganov server : remove printfs
588299c2
ggerganov Merge branch 'master' into HEAD
c5de7598
ggerganov sampling : zero-initialize input buffers
791ecb94
ggerganov
ggerganov
ggerganov minor : add comments + some cleanup
4c3d5422
ggerganov llama : assert at most one output token per sequence
435c9670
ggerganov tests : add more top_k tests
0d85c5ca
ggerganov Merge branch 'master' into HEAD
8071a57c
ggerganov
am17an
ggerganov
am17an
ggerganov
ORippler CUDA: Fix non-determinism of CUB-based Top-K
b3cf4eb1
ORippler CUDA: Optimize index of top_k_cub
6975bda9
ORippler Apply code-formatting to top-k.cu
194401af
ORippler Merge remote-tracking branch 'origin/master' into gpu-sampling
9f6c1f33
ORippler CUDA: Remove obsolete temp_keys from CUB
03454de7
ORippler
ggerganov
ggerganov minor : cleanup, TODOs, etc.
2e54b1db
ggerganov ggerganov merged d3dce4e0 into master 12 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone