OptimizedLinear updates #5791
Add fp8-fused gemm kernel
4c3b8fd5
add get_scale function
c0e97f1c
fix a few things to run the test
cb0e0a65
Merge branch 'master' into add-fp8-gemm
a11f9c5e
fixes for optim linear
4169b137
progress
58967436
lora fixes + initial ckpt signal
e600a385
base_weight -> weight
ef52cd1e
use flattened tensors for BWS
a170fdd9
fix illegal memory corner cases with an extra condition for reading s…
390a984f
reduce memory pressure
b43c242b
more changes
40add9ea
small fix for fp16 quantization
057ce52f
ds lora injection api support (#8)
966ebd4f
Merge branch 'master' into ds-llama
6ec4eada
various clean-up
fe6b082f
updates for tests
c163c211
Merge branch 'master' into ds-llama
527cc236
Merge branch 'master' into ds-llama
2bf3290f
Merge branch 'master' into ds-llama
cbfd54de
HeyangQin
approved these changes
on 2024-08-13
loadams
enabled auto-merge 1 year ago
loadams
merged
6e5d58d2
into master 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub