Support llamav27b training (#1924)
Summary:
Training should work fine on A100 so adding a skip for a10G
EDIT: Na training OOMs I might be able to reduce the input length for trianing though to reduce activation memory
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1924
Reviewed By: xuzhao9
Differential Revision: D49584103
Pulled By: msaroufim
fbshipit-source-id: 7488e368692404397fd9b227f78092ce35c0edcc