server : parallel decoding and multimodal (cont) #3677
implementing parallel decoding in server example
63f99b1e
crash fixed
47123020
save dev progress
78504218
Merge branch 'master' of https://github.com/ggerganov/llama.cpp
b716eeb7
refactored sampling function
29c8cdd6
completion endpoint working
81484805
multiple client support
5b8e29de
grammar + no stream completion
83c2b355
cached prompt support
500ac712
chat.mjs support cached prompt + some fixes
4ba5a501
server ui now support multiple clients
6358ae5f
unused change reverted
a410a9e3
fixed timings per slot
b6d9e212
add context swap
a2c2d98c
add changes to README.md
eb082012
llava multimodal integration
9d98cdda
fixed tokens probs
de35b479
add multimodal input - alfa
9f72b446
refactor code + remove unused comments + improved README.md
7e64bfe0
fix compilation errors with llvm
299f6b54
notify the user from server ui that multimodality is unavialable
4e5c5c45
Merge branch 'ggerganov:master' into master
f47fd17b
Merge pull request #6 from damian0815/fssrepo_mac_fixes
9035978a
some ci fixes
ce961a30
fix ci make build undefined ref errors
b727e022
fix long prompt than ctx proposed in #3639
fd64f04f
fixed premature end due stop word
2d9f11db
context shift fixed
d7eca255
fix llava implementation
4d180433
sync README.md changes
aa2268f4
Merge remote-tracking branch 'upstream/master'
fa0f22f1
readme change
58f8ae9b
update api like OpenAI
6c277eaa
multimodal support enabled by default
ed0c11cb
fix make bui;d errors
d2b1fac6
fix multiple clients
c02c52ef
fix zig build
35fd3743
Merge branch 'ggerganov:master' into master
84b8f2b0
new sampling API
7196c4e0
Merge branch 'master' of https://github.com/ggerganov/llama.cpp
8540568c
latest changes of sampling API
ab2fc002
server : coding-style normalization
e44ed601
server : coding-style normalization (part 2)
654e0a1f
server : remove beam-search functionality
a8c981b7
server : bug fix in ingest_images
3d5929e8
server : use refs + use llama_batch_clear()
e3a2c3fe
server : snake case
9740824b
server : minor sync
325d1793
added thread safe pipeline
6b2437e3
server : bach has to be allocated for n_parallel sequences
113dd600
server : no need for atomic int - already using mutex
5d540e80
server : logs + minor code style
778c070d
server : fix multibyte handle in partial response (#3706)
17b23eb9
fix image load + view image in chat
2eb4c11e
monatis
approved these changes
on 2023-10-22
Merge branch 'master' into server-rev
176993c8
make : silence stb warnings
4b4ab722
clip : link to ggml, not to llama
715f384a
server : fix switch fallthrough
197a0a9e
server : fix crash in Debug on macOS (I have no idea why this fixes i…
ef18f4d5
server : refactor ctx_sampling init + n_ctx + names
569ebf11
server : bug fix for prompt caching
f67d9713
Do not save/load image_data to localStorage
5359fb92
editorconfig : new line in index.html
f305d643
server : completion requests remember slot_id
a8063171
Update readme to document multimodal in server
2679c432
Merge branch 'server-rev' of https://github.com//ggerganov/llama.cpp …
a4d69d8b
server : minor style
dd1af2ed
Update readme to document multimodal in server
3d6a687f
server : hide ctx_sampling->prev behind API (#3696)
00ae55b3
server : apply fix from #3722
8fe7ca48
server : fix slot reuse
83e14901
server : add comment about changing slot_state to bool
c0f4d548
ggerganov
merged
438c2ca8
into master 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub