DeepSpeed
ed10cc73 - Add support of Qwen models (7b, 14b, 72b) to DeepSpeed-FastGen (#4913)

Commit
1 year ago
Add support of Qwen models (7b, 14b, 72b) to DeepSpeed-FastGen (#4913) This PR adds support for Qwen models 7b, 14b and 72b. ### Test Code for mii pipeline: ```python from mii import pipeline pipe = pipeline("Qwen/Qwen-7B-Chat") pipe.tokenizer.tokenizer.eos_token_id = 151643 output = pipe(["DeepSpeed is"], max_new_tokens=128, do_sample=False) print(output) ``` for huggingface: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval() inputs = tokenizer('DeepSpeed is', return_tensors='pt') inputs = inputs.to(model.device) pred = model.generate(**inputs, max_new_tokens=128, do_sample=False, repetition_penalty=1.0) test = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False) print(test) ``` ### Qwen 7B Huggingface output with prompt "DeepSpeed is": ``` a high-performance, low-latency database management system designed for real-time analytics and machine learning applications. It is built on top of Apache Arrow, a columnar in-memory data format, and is optimized for processing large volumes of data in parallel.\n\nDeepSpeed offers several key features that make it well-suited for real-time analytics and machine learning applications:\n\n1. High Performance: DeepSpeed is designed to deliver high performance by leveraging parallel processing and optimized data structures. It can process large volumes of data in real-time, making it ideal for applications that require real-time analytics.\n\n2. Low Latency: DeepSpeed is designed to minimize latency by ``` DeepSpeed-FastGen output with prompt "DeepSpeed is": ``` a high-performance, low-latency database management system designed for real-time analytics and machine learning applications. It is built on top of Apache Arrow, a columnar in-memory data format, and is optimized for processing large volumes of data in parallel.\n\nDeepSpeed offers several key features that make it well-suited for real-time analytics and machine learning applications:\n\n1. High Performance: DeepSpeed is designed to deliver high performance by leveraging parallel processing and optimized data structures. It can process large volumes of data in real-time, making it ideal for applications that require real-time analytics.\n\n2. Low Latency: DeepSpeed is designed to minimize latency by ``` ### Qwen 72B Huggingface output with prompt "DeepSpeed is": ``` 是一个开源的深度学习优化库,它提供了多种优化技术,包括模型并行、数据并行、混合并行、ZeRO内存优化等。它可以帮助用户在大规模GPU集群上训练深度学习模型,提高训练速度,减少内存使用。\n在Deepspeed中,模型并行是一种将模型的不同部分分配到不同的GPU上的技术。这样可以处理模型太大,无法放在一个GPU上的问题。数据并行是将数据集分成多个部分,每个部分在不同的GPU上进行训练。混合并行则是结合了模型并行和数据并行,以更有效地利用GPU资源 ``` DeepSpeed-FastGen output with prompt "DeepSpeed is" with 8-way sharding: ``` 是一个开源的深度学习优化库,它提供了多种优化技术,包括模型并行、数据并行、混合并行、ZeRO内存优化等。它可以帮助用户在大规模GPU集群上训练深度学习模型,提高训练速度,减少内存使用。\n在Deepspeed中,模型并行是一种将模型的不同部分分配到不同的GPU上的技术。这样可以处理模型太大,无法放在一个GPU上的问题。数据并行是将数据集分成多个部分,每个部分在不同的GPU上进行训练。混合并行则是结合了模型并行和数据并行,以更有效地利用GPU资源 ``` --------- Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Author
Parents
Loading