onnxruntime
dabd395f - llama 70b model fusion and shardding (#18175)

Commit

2 years ago

llama 70b model fusion and shardding (#18175) ### Description Support llama-70b model fusion and shardding ### Motivation and Context This change enables shard and export llama-70b model into Onnx as this model is too large for single GPU. This change also fuses llama-70b model with repeat_kv pattern different with llama-7b and llama-13b.

References

#18175 - llama 70b model fusion and shardding

Author

frank-dong-ms

Parents

178f7caa

onnxruntime dabd395f - llama 70b model fusion and shardding (#18175)

onnxruntime
dabd395f - llama 70b model fusion and shardding (#18175)