BLOOM Inference via DeepSpeed-Inference, Accelerate and DeepSpeed-ZeRO (#308)
* hardcode the dtype depending on the model
* change the mp based on the world_size
* remove hardcoded world_size
* add bigscience/bigscience-small-testing
* fixes
* add zero-inference script
* fixes
* fix
* working script
* renames
* fixes
* fix for offline use
* add benchmark
* add benchmark
* update
* cleanup
* update
* msecs
* cleanup
* improve
* fix benchmark, add warmup
* update
* fix; thanks Michael Wyatt
* clarify
* add bloom batch-inference script
* removed the names :-)
* fold the bs functionality from the other script
* fix
* restore do_sample
* dump generate args
* fix
* fix
* support any batchsize
* div by bs
* mul by bs
* add cpu_offload; sync scripts
* wip
* improvements
* fixes
* fixes
* add accelerate script
* fix
* wip
* wip
* stats
* add OnDevice and remove zero-inference (#316)
* wip
* rework generate + benchmark
* figure out the memory map dynamically
* bug fix
* fix ds-zero-inference wrt device
* bug fix
* update
* update
* fix
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>