llama.cpp
sampling : add support for backend sampling
#17004
Merged

Commits
  • sampling : add support for backend sampling
    danbev committed 146 days ago
  • llama-cli : add backend sampler configuration
    danbev committed 146 days ago
  • server : add backend sampling options/configuration
    danbev committed 146 days ago
  • webui : add backend sampling options
    danbev committed 146 days ago
  • ggml : add initial cumsum implementation for CUDA
    danbev committed 146 days ago
  • sampling : enable all backend sampler tests
    danbev committed 146 days ago
  • graph : do not include llama-model.h
    ggerganov committed 145 days ago
  • sampling : always expose sampled_ids
    danbev committed 145 days ago
  • sampling : ensure at most one output token per seq
    danbev committed 145 days ago
  • CUDA: Optimize argsort for gpu-based token sampling
    ORippler committed 145 days ago
  • sampling : remove version from sampler chain
    danbev committed 145 days ago
  • sampling : always populate logits for sampled probs
    danbev committed 145 days ago
  • sampling : simplify backend sampling logic decode
    danbev committed 145 days ago
  • squash! sampling : simplify backend sampling logic decode
    danbev committed 144 days ago
  • common : fix regression caused by extra memory allocations during sampling
    ggerganov committed 144 days ago
  • squash! sampling : simplify backend sampling logic decode
    danbev committed 144 days ago
  • Merge remote-tracking branch 'upstream/master' into backend-sampling
    danbev committed 144 days ago
  • squash! common : fix regression caused by extra memory allocations during sampling
    danbev committed 144 days ago
  • sampling : introduce sampling_info struct
    danbev committed 143 days ago
  • sampling : return early if backend sampling is disabled
    danbev committed 143 days ago
  • sampling : use pinned memory for backend sampling buffers
    danbev committed 142 days ago
  • common, tools : refactor model loading to support backend samplers
    danbev committed 142 days ago
  • Merge remote-tracking branch 'upstream/master' into backend-sampling
    danbev committed 142 days ago
  • sampling : add stride variable for clarity
    danbev committed 140 days ago
  • sampling: clarify candidate ids usage in comments
    danbev committed 140 days ago
  • sampling : fix copying both sampled tokens and logits/probs from backend
    danbev committed 140 days ago
  • tests : cleanup test-backend-sampler.cpp
    danbev committed 140 days ago
  • Merge remote-tracking branch 'upstream/master' into backend-sampling
    danbev committed 140 days ago
  • common : remove build-info.cpp from commit [no ci]
    danbev committed 140 days ago
  • sampling : cleanup and clarify output_reserve
    danbev committed 139 days ago
  • sampling : remove redundant checks for stride and size [no ci]
    danbev committed 139 days ago
  • sampling : add debug log when backend sampler selects token
    danbev committed 139 days ago
  • examples : update batched to use backend sampling
    danbev committed 139 days ago
  • llama-cli : fix dangling reference to sampler config
    ggerganov committed 139 days ago
  • common : initialize backend samplers
    ggerganov committed 139 days ago
  • samplers : add missing cont
    ggerganov committed 139 days ago
  • sampling : add assertions for contiguous tensors in async copy functions
    danbev committed 139 days ago
  • Merge remote-tracking branch 'upstream/master' into backend-sampling
    danbev committed 139 days ago
  • examples : add info about hybrid sampling in batched [no ci]
    danbev committed 139 days ago
  • Merge remote-tracking branch 'upstream/master' into gpu-sampling
    danbev committed 139 days ago
  • sampling : remove backend-dist option (wip)
    danbev committed 138 days ago
  • Merge remote-tracking branch 'upstream/master' into backend-sampling
    danbev committed 138 days ago
  • CUDA: Add top-k implementation
    ORippler committed 138 days ago
  • sampling : add min-p backend sampler
    danbev committed 137 days ago
  • Use `FetchContent` over CPM as it's bundled with CMake
    ORippler committed 137 days ago
  • common : add get_active_samplers function to check enabled samplers
    danbev committed 137 days ago
  • cuda : fix editorconfig-checker warning
    danbev committed 137 days ago
  • Merge remote-tracking branch 'upstream/master' into backend-sampling
    danbev committed 137 days ago
  • sampling : use argmax for min-p sampling
    danbev committed 137 days ago
  • sampling : fix temperature check to allow zero temperature
    danbev committed 137 days ago
  • cuda : fix top-k compilation when CUB is unavailable
    danbev committed 137 days ago
  • sampling : add comments about backend sampler [no ci]
    danbev committed 136 days ago
  • sampling : remove backend sampling chain from common_sampler
    danbev committed 136 days ago
  • Fix top-k comp & behavior for non-CUB path
    ORippler committed 136 days ago
  • sampling : support intermixed backend/cpu samplers
    danbev committed 136 days ago
  • squash! sampling : support intermixed backend/cpu samplers
    danbev committed 136 days ago
  • squash! sampling : support intermixed backend/cpu samplers
    danbev committed 135 days ago
  • refactor : simplify and improve memory management
    ggerganov committed 135 days ago
  • Add initial version for top-p sampling
    ORippler committed 135 days ago
  • sampling : use logits directly for min-p filtering
    danbev committed 135 days ago
  • sampling : simplify
    ggerganov committed 135 days ago
  • llama : simplify
    ggerganov committed 134 days ago
  • llama : cleanup + naming
    ggerganov committed 134 days ago
  • Merge branch 'master' into HEAD
    ggerganov committed 134 days ago
  • llama : call backend_init once
    ggerganov committed 134 days ago
  • Merge branch 'master' into HEAD
    ggerganov committed 134 days ago
  • llama : reserve graphs with samplers
    ggerganov committed 134 days ago
  • llama : naming
    ggerganov committed 134 days ago
  • cont : naming
    ggerganov committed 133 days ago
  • sampling : lower log level for output buffer reallocations [no ci]
    danbev committed 133 days ago
  • Fix backend_top_p_sampler
    ORippler committed 132 days ago
  • Merge branch 'master' into HEAD
    ggerganov committed 132 days ago
  • Factor out `ggml_sort` into its own function
    ORippler committed 132 days ago
  • Make backend's top_p sampler inclusive
    ORippler committed 132 days ago
  • common : simplify sampler chain initialization
    ggerganov committed 132 days ago
  • sampling : do not create empty samplers
    ggerganov committed 132 days ago
  • sampling : fix top_p empty condition
    ggerganov committed 132 days ago
  • examples : remove outdated backend sampling section
    danbev committed 132 days ago
  • sampling : fix backend temp sampler for zero temperature
    danbev committed 132 days ago
  • Merge remote-tracking branch 'upstream/master' into gpu-sampling
    danbev committed 132 days ago
  • CUDA: Move cccl fetch to after cuda has been enabled in CMakeLists.txt
    ORippler committed 131 days ago
  • CUDA: Use standard-compliant preprocessor for MSVC builds
    ORippler committed 131 days ago
  • CUDA: Update CCCL's rc candidate
    ORippler committed 131 days ago
  • squash! sampling : fix backend temp sampler for zero temperature
    danbev committed 131 days ago
  • Merge remote-tracking branch 'upstream/master' into backend-sampling
    danbev committed 131 days ago
  • sampling : implement temp_ext_backend sampling
    danbev committed 131 days ago
  • sampling : minor cleanup
    ggerganov committed 130 days ago
  • sampling : stop short if backend sampler sampled a token
    danbev committed 130 days ago
  • Merge remote-tracking branch 'upstream/master' into backend-sampling
    danbev committed 130 days ago
  • Revert "sampling : stop short if backend sampler sampled a token"
    danbev committed 130 days ago
  • sampling : fix backend temp sampling to use logits masking
    danbev committed 130 days ago
  • sampling : simplify temp sampling
    ggerganov committed 129 days ago
  • sampling : remove redundant calls to ggml_build_forward_expand
    ggerganov committed 129 days ago
  • sampling : check backend support during init
    ggerganov committed 129 days ago
  • cont : keep backend sampling disabled for now
    ggerganov committed 129 days ago
  • sampling : fix outputs and device checks
    ggerganov committed 129 days ago
  • sampling : fix candidates logic
    ggerganov committed 128 days ago
  • Add perf-tests for CUMSUM
    ORippler committed 128 days ago
  • Merge branch 'master' into gpu-sampling
    ORippler committed 128 days ago
  • Readd `cub::DeviceScan::InclusiveSum`-based CumSum
    ORippler committed 128 days ago
  • + more commits ...
Loading