DeepSpeed
e2ef102f
- Merge branch 'add-llama2-support' into quantization-refresh
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Hide Comment Changes
Previous Change (CTRL+↑)
Next Change (CTRL+↓)
Expand Context Lines
Collapse Context Lines
Hide Minimap (CTRL+M)
Commit
1 year ago
Merge branch 'add-llama2-support' into quantization-refresh
References
quantization-refresh
#4351 - DS-Inference Quantization refresh: Fix several issues and add more features
Author
RezaYazdaniAminabadi
Parents
66e97f81
c72aa76a
Files
52
.github/workflows
nv-pre-compile-ops.yml
README.md
csrc
deepspeed4science/evoformer_attn
attention.cpp
attention.cu
attention_back.cu
epilogue
epilogue_grad_bias.h
epilogue_pipelined.h
epilogue_rescale_output.h
epilogue_thread_apply_logsumexp.h
gemm
custom_mma.h
custom_mma_base.h
custom_mma_multistage.h
custom_mma_pipelined.h
find_default_mma.h
mma_accum_lambda_iterator.h
mma_from_smem.h
gemm_kernel_utils.h
iterators
epilogue_predicated_tile_iterator.h
make_residual_last.h
predicated_tile_access_iterator_residual_last.h
predicated_tile_iterator_atomic.h
predicated_tile_iterator_residual_last.h
transpose_warp_iterator.h
warp_iterator_from_smem.h
kernel_backward.h
kernel_forward.h
transform
bias_broadcast.h
tile_smem_loader.h
transformer/inference/csrc
pt_binding.cpp
transform.cu
deepspeed
module_inject
containers
__init__.py
internlm.py
replace_module.py
replace_policy.py
utils.py
ops
deepspeed4science
__init__.py
evoformer_attn.py
transformer/inference
ds_attention.py
op_binding
mlp_gemm.py
qkv_gemm.py
docs
_config.yml
_data
navigation.yml
_pages
deepspeed4science.md
_tutorials
ds4sci_evoformerattention.md
assets/images
3pillars.png
DeepSpeed-pillars.png
evoformer.png
new-megatron-ds.png
index.md
op_builder
evoformer_attn.py
tests
benchmarks
DS4Sci_EvoformerAttention_bench.py
unit/ops/deepspeed4science
test_DS4Sci_EvoformerAttention.py
Loading