PR #17004 sampling : add support for backend sampling

sampling : add support for backend sampling

danbev committed 146 days ago

llama-cli : add backend sampler configuration

danbev committed 146 days ago

server : add backend sampling options/configuration

danbev committed 146 days ago

webui : add backend sampling options

danbev committed 146 days ago

ggml : add initial cumsum implementation for CUDA

danbev committed 146 days ago

sampling : enable all backend sampler tests

danbev committed 146 days ago

graph : do not include llama-model.h

ggerganov committed 145 days ago

sampling : always expose sampled_ids

danbev committed 145 days ago

sampling : ensure at most one output token per seq

danbev committed 145 days ago

CUDA: Optimize argsort for gpu-based token sampling

ORippler committed 145 days ago

sampling : remove version from sampler chain

danbev committed 145 days ago

sampling : always populate logits for sampled probs

danbev committed 145 days ago

sampling : simplify backend sampling logic decode

danbev committed 145 days ago

squash! sampling : simplify backend sampling logic decode

danbev committed 144 days ago

common : fix regression caused by extra memory allocations during sampling

ggerganov committed 144 days ago

squash! sampling : simplify backend sampling logic decode

danbev committed 144 days ago

Merge remote-tracking branch 'upstream/master' into backend-sampling

danbev committed 144 days ago

squash! common : fix regression caused by extra memory allocations during sampling

danbev committed 144 days ago

sampling : introduce sampling_info struct

danbev committed 143 days ago

sampling : return early if backend sampling is disabled

danbev committed 143 days ago

sampling : use pinned memory for backend sampling buffers

danbev committed 142 days ago

common, tools : refactor model loading to support backend samplers

danbev committed 142 days ago

Merge remote-tracking branch 'upstream/master' into backend-sampling

danbev committed 142 days ago

sampling : add stride variable for clarity

danbev committed 140 days ago

sampling: clarify candidate ids usage in comments

danbev committed 140 days ago

sampling : fix copying both sampled tokens and logits/probs from backend

danbev committed 140 days ago

tests : cleanup test-backend-sampler.cpp

danbev committed 140 days ago

Merge remote-tracking branch 'upstream/master' into backend-sampling

danbev committed 140 days ago

common : remove build-info.cpp from commit [no ci]

danbev committed 140 days ago

sampling : cleanup and clarify output_reserve

danbev committed 139 days ago

sampling : remove redundant checks for stride and size [no ci]

danbev committed 139 days ago

sampling : add debug log when backend sampler selects token

danbev committed 139 days ago

examples : update batched to use backend sampling

danbev committed 139 days ago

llama-cli : fix dangling reference to sampler config

ggerganov committed 139 days ago

common : initialize backend samplers

ggerganov committed 139 days ago

samplers : add missing cont

ggerganov committed 139 days ago

sampling : add assertions for contiguous tensors in async copy functions

danbev committed 139 days ago

Merge remote-tracking branch 'upstream/master' into backend-sampling

danbev committed 139 days ago

examples : add info about hybrid sampling in batched [no ci]

danbev committed 139 days ago

Merge remote-tracking branch 'upstream/master' into gpu-sampling

danbev committed 139 days ago

sampling : remove backend-dist option (wip)

danbev committed 138 days ago

Merge remote-tracking branch 'upstream/master' into backend-sampling

danbev committed 138 days ago

CUDA: Add top-k implementation

ORippler committed 138 days ago

sampling : add min-p backend sampler

danbev committed 137 days ago

Use `FetchContent` over CPM as it's bundled with CMake

ORippler committed 137 days ago

common : add get_active_samplers function to check enabled samplers

danbev committed 137 days ago

cuda : fix editorconfig-checker warning

danbev committed 137 days ago

Merge remote-tracking branch 'upstream/master' into backend-sampling

danbev committed 137 days ago

sampling : use argmax for min-p sampling

danbev committed 137 days ago

sampling : fix temperature check to allow zero temperature

danbev committed 137 days ago

cuda : fix top-k compilation when CUB is unavailable

danbev committed 137 days ago

sampling : add comments about backend sampler [no ci]

danbev committed 136 days ago

sampling : remove backend sampling chain from common_sampler

danbev committed 136 days ago

Fix top-k comp & behavior for non-CUB path

ORippler committed 136 days ago

sampling : support intermixed backend/cpu samplers

danbev committed 136 days ago

squash! sampling : support intermixed backend/cpu samplers

danbev committed 136 days ago

squash! sampling : support intermixed backend/cpu samplers

danbev committed 135 days ago

refactor : simplify and improve memory management

ggerganov committed 135 days ago

Add initial version for top-p sampling

ORippler committed 135 days ago

sampling : use logits directly for min-p filtering

danbev committed 135 days ago

sampling : simplify

ggerganov committed 135 days ago

llama : simplify

ggerganov committed 134 days ago

llama : cleanup + naming

ggerganov committed 134 days ago

Merge branch 'master' into HEAD

ggerganov committed 134 days ago

llama : call backend_init once

ggerganov committed 134 days ago

Merge branch 'master' into HEAD

ggerganov committed 134 days ago

llama : reserve graphs with samplers

ggerganov committed 134 days ago

llama : naming

ggerganov committed 134 days ago

cont : naming

ggerganov committed 133 days ago

sampling : lower log level for output buffer reallocations [no ci]

danbev committed 133 days ago

Fix backend_top_p_sampler

ORippler committed 132 days ago

Merge branch 'master' into HEAD

ggerganov committed 132 days ago

Factor out `ggml_sort` into its own function

ORippler committed 132 days ago

Make backend's top_p sampler inclusive

ORippler committed 132 days ago

common : simplify sampler chain initialization

ggerganov committed 132 days ago

sampling : do not create empty samplers

ggerganov committed 132 days ago

sampling : fix top_p empty condition

ggerganov committed 132 days ago

examples : remove outdated backend sampling section

danbev committed 132 days ago

sampling : fix backend temp sampler for zero temperature

danbev committed 132 days ago

Merge remote-tracking branch 'upstream/master' into gpu-sampling

danbev committed 132 days ago

CUDA: Move cccl fetch to after cuda has been enabled in CMakeLists.txt

ORippler committed 131 days ago

CUDA: Use standard-compliant preprocessor for MSVC builds

ORippler committed 131 days ago

CUDA: Update CCCL's rc candidate

ORippler committed 131 days ago

squash! sampling : fix backend temp sampler for zero temperature

danbev committed 131 days ago

Merge remote-tracking branch 'upstream/master' into backend-sampling

danbev committed 131 days ago

sampling : implement temp_ext_backend sampling

danbev committed 131 days ago

sampling : minor cleanup

ggerganov committed 130 days ago

sampling : stop short if backend sampler sampled a token

danbev committed 130 days ago

Merge remote-tracking branch 'upstream/master' into backend-sampling

danbev committed 130 days ago

Revert "sampling : stop short if backend sampler sampled a token"

danbev committed 130 days ago

sampling : fix backend temp sampling to use logits masking

danbev committed 130 days ago

sampling : simplify temp sampling

ggerganov committed 129 days ago

sampling : remove redundant calls to ggml_build_forward_expand

ggerganov committed 129 days ago

sampling : check backend support during init

ggerganov committed 129 days ago

cont : keep backend sampling disabled for now

ggerganov committed 129 days ago

sampling : fix outputs and device checks

ggerganov committed 129 days ago

sampling : fix candidates logic

ggerganov committed 128 days ago

Add perf-tests for CUMSUM

ORippler committed 128 days ago

Merge branch 'master' into gpu-sampling

ORippler committed 128 days ago

Readd `cub::DeviceScan::InclusiveSum`-based CumSum

ORippler committed 128 days ago

llama.cpp sampling : add support for backend sampling #17004 Merged

llama.cpp
sampling : add support for backend sampling
#17004

Merged