Add generation server scripts using HF accelerate and DS-inference (#328)
* first step towards making libs
* HF accelerate model
* refactor accelerate
* refactor DS inference
* refactor DS ZeRO
* make inference library
* cli
* server
* request
* remove MaxTokensError
* fix batch size error with DS inference server
* type fix
* add latency
* add latency
* add min_length to default kwargs
* str kwargs
* str kwargs
* fix comma
* add old scripts back
* move scripts
* drop data
* minor changes + add README
* update README
* drop nccl
* fix
* default values
* resolve issues
* handle keyboard interrupt
* remove caching
* use snapshot_download
* make server class
* fix snapshot download
Co-authored-by: Mayank Mishra <mayank31398@gmail.com>