text-generation-inference
Improve the handling of quantized weights
#2250
Merged

Improve the handling of quantized weights #2250

danieldk merged 2 commits into main from refactor/quantization-handling
danieldk
danieldk Improve the handling of quantized weights
a93b2b50
danieldk danieldk force pushed from 5bbbce9c to e22f411c 1 year ago
OlivierDehaene
OlivierDehaene dismissed these changes on 2024-07-18
OlivierDehaene OlivierDehaene requested a review from OlivierDehaene OlivierDehaene 1 year ago
danieldk danieldk dismissed their stale review via 8ebec90a 1 year ago
danieldk danieldk force pushed from e22f411c to 8ebec90a 1 year ago
OlivierDehaene
OlivierDehaene commented on 2024-07-18
OlivierDehaene
OlivierDehaene commented on 2024-07-18
danieldk danieldk force pushed from 8ebec90a to 59fc128c 1 year ago
danieldk danieldk force pushed from 59fc128c to d819a3c2 1 year ago
danieldk Exclude non-MLP layers when using FP8 quantization with Llama
cf16172a
danieldk danieldk force pushed from d819a3c2 to cf16172a 1 year ago
OlivierDehaene
OlivierDehaene approved these changes on 2024-07-18
danieldk danieldk merged ba291dad into main 1 year ago
danieldk danieldk deleted the refactor/quantization-handling branch 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone