Min P sampler implementation [alternative to Top P/Top K] #3841
cuda : prints wip
59d1232e
cuda : new cublas gemm branch for multi-batch quantized src0
52af7826
cuda : add F32 sgemm branch
16b60dd7
cuda : fine-tune >= VOLTA params + use MMQ only for small batches
a3c28439
cuda : remove duplicated cuBLAS GEMM code
4c6744b5
cuda : add CUDA_USE_TENSOR_CORES and GGML_CUDA_FORCE_MMQ macros
a4e15a36
build : add compile option to force use of MMQ kernels
49af767f
Super hacky starting implementation of Min P
a9e2b74f
cebtenzzre
marked this pull request as draft 1 year ago
Transform Min P into a proper CLI option
a235a0d2
Min P disabled if set to 1.0 or 0, otherwise Top P
838d58dc
Debugging print statements removed
69ef4ca8
erring on the side of caution; disable by default
833637b7
Remove accidentally kept prints + min_keep support
62fc7715
Standardize 0.0 disabling min_p upon feedback
49b68e82
Simplified counter by checking candidates size
6f7cdec3
minor whitespace fix
cb233584
Even formatting + exclusively 0.0f to disable now
fcbbfc16
kalomaze
marked this pull request as ready for review 1 year ago
cleanup
69e638e5
permit simultaneous use of top_p and min_p
3ddfd67d
Merge remote-tracking branch 'original/cuda-quantum-batch' into min-p…
18c0aa7c
Merge branch 'min-p-sampling' of https://github.com/kalomaze/koboldcp…
87adfad2
Update README & set 0.05 default
9248325f
added a bit more context to the README
512cac63
ggerganov
approved these changes
on 2023-10-31
Update README for consistency
974640ac
forgot one small thing!
3b58af26
Green-Sky
approved these changes
on 2023-10-31
Green-Sky
merged
238657db
into master 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub