[SYCL] fix mul_mat fault in CI/unit-test #5862
fix mul_mat fault in cpy_f32_f16
99b9edab
rm unused function
ddc12494
add wait() for memcpy
0dce40a7
restore ci/run.sh, rename struct defination, fix bug in ggml_sycl_op_…
89524c2f
fix format issue
32bf3df0
llama : fix segfault from unknown model arch name (#5820)
8899bdb6
llama : refactor internal quantization functions (#5830)
9758243a
scripts : add pod-llama.sh
dcf09d3c
ggml : IQ3_S improvements (#5829)
d0c9a891
convert-hf : make model class definitions self-contained (#5825)
9285e714
convert : automatically fall back to HfVocab if tokenizer.model doesn…
0867b91a
ggml : fix IQ3_S AVX implementation (#5834)
1a5ed7a2
llama : add abort_callback to interrupt computation (#5409)
506177de
server: tests: passkey challenge / self-extend with context shift de…
8479e7d4
flake.lock: Update (#5842)
756a4ac7
server : init http requests thread pool with --parallel if set (#5836)
f72df318
ci : schedule slow server tests only on Release or on demand (#5839)
23a6275f
llama : fix llama_copy_state_data with fragmented KV cache (#5840)
524864d3
gguf-dump : support i-quants (#5841)
8bb872de
llama : allow for user specified embedding pooling type (#5849)
e55ee8a2
readme : add API changes section
fd4a186d
cuda : fix data race in soft max (#5853)
22dd02a6
main : support special tokens as reverse/anti prompt (#5847)
f3a6dd6c
common : use LLAMA_DEFAULT_SEED (#5855)
edabfadc
add some new ops, fix some operators and add batch operations to cert…
9e4d115d
sync : ggml
b15e7533
add alias for chat template (#5858)
3ae5525a
speculative : implement stochastic speculative sampling (#5625)
e245d6c8
cmake : handle cases where git index is not found in .git (#5844)
465e411f
ggml : introduce ggml_status (ggml/750)
d87093e9
sync : ggml
3a44f13b
ggml : fix unknown status (#0)
86e4a3bd
flake : fix
dabfd53d
llama : fix embeddings (#5796)
2e4e9c00
nix: static build (#5814)
6aac3d42
fix speculative decoding build on windows (#5874)
49a84772
rebase and rm tailing space
96b9179a
Merge branch 'master' into fix_mul_mat
fa30cc86
ggerganov
approved these changes
on 2024-03-05
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub