ggml
Add parallel decoding in GPT2 example
#572
Merged

Add parallel decoding in GPT2 example #572

YavorGIvanov
Initial attempt to make gpt2 do parallel decoding
ce6139c4
Fix crash on trying to use empty embd
761db297
slaren
Make it work for n_parallel=1
38a17443
YavorGIvanov
Add short way of passing n_parallel argument
845f39c7
Move gpt-2 batched to a separate target and cpp file
42db4049
YavorGIvanov YavorGIvanov marked this pull request as ready for review 2 years ago
YavorGIvanov YavorGIvanov requested a review from ggerganov ggerganov 2 years ago
YavorGIvanov YavorGIvanov removed review request from ggerganov ggerganov 2 years ago
YavorGIvanov YavorGIvanov requested a review from slaren slaren 2 years ago
YavorGIvanov YavorGIvanov requested a review from ggerganov ggerganov 2 years ago
Add batched sample output to README and remove hardcoded model path a…
5ffcbf44
slaren
ggerganov gpt-2-batched : fix n_kv heuristic
d91540a9
Free batch at end of example
af6a1d94
ggerganov gpt-2-batched : simplify kv cache stuff (#574)
898718c0
Fix not generating n_predict tokens and fix warn
993d226f
ggerganov minor : readme
63ab3d61
Add check for end token and mark the stream as finished
c2058752
ggerganov
ggerganov approved these changes on 2023-10-12
ggerganov ggerganov merged 8e828325 into master 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone