llama.cpp
sampling : add support for backend sampling
#17004
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
179
Changes
View On
GitHub
Commits
sampling : add support for backend sampling
danbev
committed
146 days ago
llama-cli : add backend sampler configuration
danbev
committed
146 days ago
server : add backend sampling options/configuration
danbev
committed
146 days ago
webui : add backend sampling options
danbev
committed
146 days ago
ggml : add initial cumsum implementation for CUDA
danbev
committed
146 days ago
sampling : enable all backend sampler tests
danbev
committed
146 days ago
graph : do not include llama-model.h
ggerganov
committed
145 days ago
sampling : always expose sampled_ids
danbev
committed
145 days ago
sampling : ensure at most one output token per seq
danbev
committed
145 days ago
CUDA: Optimize argsort for gpu-based token sampling
ORippler
committed
145 days ago
sampling : remove version from sampler chain
danbev
committed
145 days ago
sampling : always populate logits for sampled probs
danbev
committed
145 days ago
sampling : simplify backend sampling logic decode
danbev
committed
145 days ago
squash! sampling : simplify backend sampling logic decode
danbev
committed
144 days ago
common : fix regression caused by extra memory allocations during sampling
ggerganov
committed
144 days ago
squash! sampling : simplify backend sampling logic decode
danbev
committed
144 days ago
Merge remote-tracking branch 'upstream/master' into backend-sampling
danbev
committed
144 days ago
squash! common : fix regression caused by extra memory allocations during sampling
danbev
committed
144 days ago
sampling : introduce sampling_info struct
danbev
committed
143 days ago
sampling : return early if backend sampling is disabled
danbev
committed
143 days ago
sampling : use pinned memory for backend sampling buffers
danbev
committed
142 days ago
common, tools : refactor model loading to support backend samplers
danbev
committed
142 days ago
Merge remote-tracking branch 'upstream/master' into backend-sampling
danbev
committed
142 days ago
sampling : add stride variable for clarity
danbev
committed
140 days ago
sampling: clarify candidate ids usage in comments
danbev
committed
140 days ago
sampling : fix copying both sampled tokens and logits/probs from backend
danbev
committed
140 days ago
tests : cleanup test-backend-sampler.cpp
danbev
committed
140 days ago
Merge remote-tracking branch 'upstream/master' into backend-sampling
danbev
committed
140 days ago
common : remove build-info.cpp from commit [no ci]
danbev
committed
140 days ago
sampling : cleanup and clarify output_reserve
danbev
committed
139 days ago
sampling : remove redundant checks for stride and size [no ci]
danbev
committed
139 days ago
sampling : add debug log when backend sampler selects token
danbev
committed
139 days ago
examples : update batched to use backend sampling
danbev
committed
139 days ago
llama-cli : fix dangling reference to sampler config
ggerganov
committed
139 days ago
common : initialize backend samplers
ggerganov
committed
139 days ago
samplers : add missing cont
ggerganov
committed
139 days ago
sampling : add assertions for contiguous tensors in async copy functions
danbev
committed
139 days ago
Merge remote-tracking branch 'upstream/master' into backend-sampling
danbev
committed
139 days ago
examples : add info about hybrid sampling in batched [no ci]
danbev
committed
139 days ago
Merge remote-tracking branch 'upstream/master' into gpu-sampling
danbev
committed
139 days ago
sampling : remove backend-dist option (wip)
danbev
committed
138 days ago
Merge remote-tracking branch 'upstream/master' into backend-sampling
danbev
committed
138 days ago
CUDA: Add top-k implementation
ORippler
committed
138 days ago
sampling : add min-p backend sampler
danbev
committed
137 days ago
Use `FetchContent` over CPM as it's bundled with CMake
ORippler
committed
137 days ago
common : add get_active_samplers function to check enabled samplers
danbev
committed
137 days ago
cuda : fix editorconfig-checker warning
danbev
committed
137 days ago
Merge remote-tracking branch 'upstream/master' into backend-sampling
danbev
committed
137 days ago
sampling : use argmax for min-p sampling
danbev
committed
137 days ago
sampling : fix temperature check to allow zero temperature
danbev
committed
137 days ago
cuda : fix top-k compilation when CUB is unavailable
danbev
committed
137 days ago
sampling : add comments about backend sampler [no ci]
danbev
committed
136 days ago
sampling : remove backend sampling chain from common_sampler
danbev
committed
136 days ago
Fix top-k comp & behavior for non-CUB path
ORippler
committed
136 days ago
sampling : support intermixed backend/cpu samplers
danbev
committed
136 days ago
squash! sampling : support intermixed backend/cpu samplers
danbev
committed
136 days ago
squash! sampling : support intermixed backend/cpu samplers
danbev
committed
135 days ago
refactor : simplify and improve memory management
ggerganov
committed
135 days ago
Add initial version for top-p sampling
ORippler
committed
135 days ago
sampling : use logits directly for min-p filtering
danbev
committed
135 days ago
sampling : simplify
ggerganov
committed
135 days ago
llama : simplify
ggerganov
committed
134 days ago
llama : cleanup + naming
ggerganov
committed
134 days ago
Merge branch 'master' into HEAD
ggerganov
committed
134 days ago
llama : call backend_init once
ggerganov
committed
134 days ago
Merge branch 'master' into HEAD
ggerganov
committed
134 days ago
llama : reserve graphs with samplers
ggerganov
committed
134 days ago
llama : naming
ggerganov
committed
134 days ago
cont : naming
ggerganov
committed
133 days ago
sampling : lower log level for output buffer reallocations [no ci]
danbev
committed
133 days ago
Fix backend_top_p_sampler
ORippler
committed
132 days ago
Merge branch 'master' into HEAD
ggerganov
committed
132 days ago
Factor out `ggml_sort` into its own function
ORippler
committed
132 days ago
Make backend's top_p sampler inclusive
ORippler
committed
132 days ago
common : simplify sampler chain initialization
ggerganov
committed
132 days ago
sampling : do not create empty samplers
ggerganov
committed
132 days ago
sampling : fix top_p empty condition
ggerganov
committed
132 days ago
examples : remove outdated backend sampling section
danbev
committed
132 days ago
sampling : fix backend temp sampler for zero temperature
danbev
committed
132 days ago
Merge remote-tracking branch 'upstream/master' into gpu-sampling
danbev
committed
132 days ago
CUDA: Move cccl fetch to after cuda has been enabled in CMakeLists.txt
ORippler
committed
131 days ago
CUDA: Use standard-compliant preprocessor for MSVC builds
ORippler
committed
131 days ago
CUDA: Update CCCL's rc candidate
ORippler
committed
131 days ago
squash! sampling : fix backend temp sampler for zero temperature
danbev
committed
131 days ago
Merge remote-tracking branch 'upstream/master' into backend-sampling
danbev
committed
131 days ago
sampling : implement temp_ext_backend sampling
danbev
committed
131 days ago
sampling : minor cleanup
ggerganov
committed
130 days ago
sampling : stop short if backend sampler sampled a token
danbev
committed
130 days ago
Merge remote-tracking branch 'upstream/master' into backend-sampling
danbev
committed
130 days ago
Revert "sampling : stop short if backend sampler sampled a token"
danbev
committed
130 days ago
sampling : fix backend temp sampling to use logits masking
danbev
committed
130 days ago
sampling : simplify temp sampling
ggerganov
committed
129 days ago
sampling : remove redundant calls to ggml_build_forward_expand
ggerganov
committed
129 days ago
sampling : check backend support during init
ggerganov
committed
129 days ago
cont : keep backend sampling disabled for now
ggerganov
committed
129 days ago
sampling : fix outputs and device checks
ggerganov
committed
129 days ago
sampling : fix candidates logic
ggerganov
committed
128 days ago
Add perf-tests for CUMSUM
ORippler
committed
128 days ago
Merge branch 'master' into gpu-sampling
ORippler
committed
128 days ago
Readd `cub::DeviceScan::InclusiveSum`-based CumSum
ORippler
committed
128 days ago
+ more commits ...
Loading