ZeRO-Inference refresh (#4197)
* INT4 weight only quantization (#479)
* INT4 weight only quantization
* pre commit
* fix UT
* fix UT
* fix UT
* fix UT
* fix UT
* fix UT
* fix UT
* add zero3 test
* quantize small weight first to prevent oom
* fold quantization config into ds_config
* Fix license & refactor ds_config & rebase master
* fix UT
* Moving quantization into post_init_method and add int4 dequantization kernel (#522)
* Add experimental int4 dequantize kernel
* move quantiation into post_init_method
* fix
* Refactor: move int4 code to deepspeed/inference (#528)
* Move int 4 code to deepspeed/inference
* fix
* fix
* fix
* zero++ tutorial PR (#3783)
* [Fix] _conv_flops_compute when padding is a str and stride=1 (#3169)
* fix conv_flops_compute when padding is a str when stride=1
* fix error
* change type of paddings to tuple
* fix padding calculation
* apply formatting check
---------
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* fix interpolate flops compute (#3782)
* use `Flops Profiler` to test `model.generate()` (#2515)
* Update profiler.py
* pre-commit run --all-files
* Delete .DS_Store
* Delete .DS_Store
* Delete .DS_Store
---------
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>
* revert PR #3611 (#3786)
* bump to 0.9.6
* ZeRO++ chinese blog (#3793)
* zeropp chinese blog
* try better quality images
* make title larger
* even larger...
* various fix
* center captions
* more fixes
* fix format
* remove staging trigger (#3792)
* DeepSpeed-Triton for Inference (#3748)
Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Ethan Doe <yidoe@microsoft.com>
Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* ZeRO++ (#3784)
Co-authored-by: HeyangQin <heyangqin@microsoft.com>
Co-authored-by: GuanhuaWang <alexwgh333@gmail.com>
Co-authored-by: cmikeh2 <connorholmes@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
* adding zero++ to navigation panel of deepspeed.ai (#3796)
* Add ZeRO++ Japanese blog (#3797)
* zeropp chinese blog
* try better quality images
* make title larger
* even larger...
* various fix
* center captions
* more fixes
* fix format
* add ZeRO++ Japanese blog
* add links
---------
Co-authored-by: HeyangQin <heyangqin@microsoft.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
* Bug Fixes for autotuner and flops profiler (#1880)
* fix autotuner when backward is not called
* fix format
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* Missing strided copy for gated MLP (#3788)
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
* Requires grad checking. (#3789)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* bump to 0.10.0
* Fix Bug in transform.cu (#3534)
* Bug fix
* Fixed formatting error
---------
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
* bug fix: triton importing error (#3799)
Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Fix dequant bug
* Address PR feedback
* Use super() __exit__
* Fix unit tests
---------
Co-authored-by: Donglin Zhuang <donglinzhuang@outlook.com>
Co-authored-by: Heyang Qin <heyangqin@microsoft.com>
Co-authored-by: Bill Luo <50068224+zhiruiluo@users.noreply.github.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Guorun <84232793+CaffreyR@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: stephen youn <13525892+stephen-youn@users.noreply.github.com>
Co-authored-by: Stephen Youn <styoun@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Ethan Doe <yidoe@microsoft.com>
Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com>
Co-authored-by: GuanhuaWang <alexwgh333@gmail.com>
Co-authored-by: cmikeh2 <connorholmes@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Joe Mayer <114769929+jomayeri@users.noreply.github.com>
Co-authored-by: Ramya Ramineni <62723901+rraminen@users.noreply.github.com>