llama.cpp
llama_model_loader: support multiple split/shard GGUFs
#6187
Merged

llama_model_loader: support multiple split/shard GGUFs #6187

phymbert merged 28 commits into master from hp/split/load-model
phymbert
phymbert split: support in llama_model_loader
7c64fef9
phymbert phymbert requested a review from ggerganov ggerganov 2 years ago
phymbert phymbert requested a review from slaren slaren 2 years ago
phymbert phymbert requested a review from ngxson ngxson 2 years ago
phymbert
phymbert commented on 2024-03-20
phymbert
phymbert commented on 2024-03-20
ngxson
ngxson commented on 2024-03-20
slaren
slaren commented on 2024-03-21
phymbert Avoir copying the entire vector
b8feff41
phymbert split: move llama_tensor_offset to llama_model_loader
18ff6ca8
ggerganov
ggerganov commented on 2024-03-21
phymbert Merge branch 'master' into hp/split/load-model
60a87ae0
phymbert llama_model_loader: PR feedbacks:
1892ae7e
phymbert phymbert requested a review from slaren slaren 2 years ago
phymbert phymbert requested a review from ggerganov ggerganov 2 years ago
phymbert phymbert requested a review from ngxson ngxson 2 years ago
phymbert avoid copying the entire vector
00381b07
ggerganov
ggerganov commented on 2024-03-21
phymbert Simplify this by making these optional, switch some layer creation te…
c34a5dee
phymbert Handle optional tensors
1c931f3d
phymbert
phymbert llama_model_loader: fail if backend cannot allocate buffer
d8b567d2
phymbert
slaren
slaren
ngxson
ngxson commented on 2024-03-21
phymbert
slaren
ggerganov
phymbert
slaren
phymbert
ngxson
ngxson commented on 2024-03-21
slaren fix mmap buffer management
02020b04
ngxson
ngxson commented on 2024-03-21
phymbert llama_model_loader: map file to backend buffer if the allocation succ…
078a1aca
phymbert llama_model_loader: only map tensors included in the context
69bdee93
phymbert llama_model_loader: minor, use same variable name for consistency, fi…
6df9757a
phymbert
slaren
slaren commented on 2024-03-21
phymbert llama_model_loader: fail if any of backend buffer cannot be allocated
f9a29735
slaren
slaren commented on 2024-03-21
phymbert spacing
0fd652eb
phymbert fix loop over pointer
1a179bfc
slaren
slaren commented on 2024-03-21
phymbert llama_model_loader: if n_tensors declared not equals to loaded tensor…
7cbe1eac
phymbert llama_model_loader: ensure mappings vector has the expected size
9940df4f
phymbert llama_model_loader: use at instead of operator[] if this should neve…
ec372c66
phymbert llama_model_loader: immediately add the backend buffer to the model b…
a9e88c6e
phymbert llama_model_loader: be sure the model mappings has enough capacity be…
b19af364
phymbert llama_model_loader: fix map -> unordered map
4c044009
phymbert phymbert changed the title llama_model_loader: support multiple split GGUFs llama_model_loader: support multiple split/shard GGUFs 2 years ago
phymbert llama_split_prefix: use a clearer version, not pass split path len bu…
e474e456
ggerganov llama : minor
8326607c
ggerganov llama : introduce some typedef helpers
dbc35acf
ggerganov
ggerganov approved these changes on 2024-03-22
phymbert
phymbert
ggerganov
phymbert docs: add model shard in hot topic
f616b38b
ngxson
ngxson approved these changes on 2024-03-22
slaren
slaren commented on 2024-03-22
phymbert llama_model_loader: put mapping in a unique_ptr from the moment it is…
1f387599
ngxson fix llama_split_prefix
764c7afe
slaren
slaren approved these changes on 2024-03-22
phymbert phymbert merged dba1af61 into master 2 years ago
phymbert phymbert deleted the hp/split/load-model branch 2 years ago
phymbert
phymbert commented on 2024-03-23

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone