llama.cpp
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B
#13386

Merged

Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B #13386

slaren merged 4 commits into ggml-org:master from hjc4869:no_op_offload

Add --disable-op-offload

2e747874

github-actions added testing

github-actions added examples

github-actions added ggml

slaren commented on 2025-05-08

Avoid negative bools in library.

31e19202

Fix default value of ggml_backend_sched_new

0d53a04b

hjc4869 requested a review from

slaren 178 days ago

slaren commented on 2025-05-10

Rename to --no-op-offload for consistency

eae3a319

hjc4869 changed the title ~~Add `--disable-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B~~ Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B 178 days ago

slaren approved these changes on 2025-05-11

slaren merged 7f323a58 into master 177 days ago

hjc4869 deleted the no_op_offload branch 177 days ago

Reviewers

slaren

Assignees

No one assigned

Labels

testing examples ggml

Milestone

No milestone

llama.cpp Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B #13386 Merged

Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B #13386

llama.cpp
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B
#13386

Merged