DeepSpeed
FP6 quantization end-to-end.
#5234
Merged

FP6 quantization end-to-end. #5234

loadams merged 33 commits into master from features/rebase-quant-fp6
loadams
JamesTheZ FP6 quantization end-to-end.
a4562ab7
JamesTheZ Update CUDA kernels and clean codes.
91bb4d79
JamesTheZ Make the quantizer on GPU.
1c2131d2
JamesTheZ [WIP] Fix the bug of FP16-to-FP6 data packing.
1ba45fdd
arashb Add FP6 end-to-end unit tests
ff6c3c3a
JamesTheZ Refine the FP16-to-FP6 cast logic.
368a7630
arashb Add unit tests for FP6 quantizer
6c45a84b
JamesTheZ Fix FP16-FP6 cast problems.
90b710d2
JamesTheZ Update FP6 kernels.
f8e3acfb
JamesTheZ Fix the bug of subnormal FP6 casting and the 2bit/4bit tensor allocat…
b025c5ad
JamesTheZ Clean code.
6ed67f77
JamesTheZ pre-commit
20b543ca
JamesTheZ Deal with the subnormal FP6 and FP16 values and refine the UT.
c43947a2
JamesTheZ Update according to review comments.
a6d2f2f0
JamesTheZ Fix the CI workflow problem for FP6 end-to-end.
62a2d495
JamesTheZ Fix at::nullopt and at::optional conflicts.
118af370
JamesTheZ Refine split-k setting.
56eb8b90
JamesTheZ Remove debug files.
0ddbfd11
JamesTheZ Only compiler the kernel body for SM >= 8.0.
35c82f25
JamesTheZ Fix the GPU architecture requirement of FP6 kernel.
63489d17
mrwyattii Update deepspeed/inference/v2/config_v2.py
ed00ac92
mrwyattii Update deepspeed/inference/v2/config_v2.py
b15a1a10
mrwyattii refactor fp6 tests, fix import error
c2e6ebb9
mrwyattii Update deepspeed/inference/v2/modules/implementations/linear/quantize…
fb8887c9
mrwyattii Update requirements.txt
77f3883d
mrwyattii revert testing to fix A6000 test
f6bcdee0
loadams Update pydantic version
e1a4ce04
mrwyattii fix pydantic import
e86611fc
JamesTheZ Fix some review comments.
7e28144d
loadams Pin pydantic to latest version
f8454a08
JamesTheZ Add the missed torch import.
bed775e1
loadams loadams requested a review from mrwyattii mrwyattii 1 year ago
loadams loadams requested a review from awan-10 awan-10 1 year ago
loadams loadams requested a review from arashb arashb 1 year ago
loadams loadams requested a review from tjruwase tjruwase 1 year ago
xiaoxiawu-microsoft xiaoxiawu-microsoft enabled auto-merge 1 year ago
arashb arashb requested a review from xiaoxiawu-microsoft xiaoxiawu-microsoft 1 year ago
xiaoxiawu-microsoft
xiaoxiawu-microsoft approved these changes on 2024-03-06
mrwyattii
mrwyattii approved these changes on 2024-03-06
arashb
arashb approved these changes on 2024-03-06
disabled auto-merge 1 year ago
Manually disabled by user
loadams Merge branch 'master' into features/rebase-quant-fp6
f34312ad
loadams Merge branch 'master' into features/rebase-quant-fp6
4a917880
loadams
xiaoxiawu-microsoft xiaoxiawu-microsoft enabled auto-merge 1 year ago
disabled auto-merge 1 year ago
Manually disabled by user
loadams loadams merged ccfdb84e into master 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone