Add AMD target
Summary:
Currently most of the internal models don't run on AMD because their dependencies are not hipified yet.
Therefore, we manually copied three model code from upstream:
- ctr_mbl_feed_30x OverArch
- inline_cvr_7x OverArch
- dhen_5x OverArch
They won't sync with the upstream, but they also don't depend on any upstream code and runs on AMD GPU.
In addition to these 3 internal models, we have verfied that 10 OSS models also run on AMD GPU:
- densenet121
- hf_Bert
- hf_T5_large
- hf_clip
- llama_v2_7b_16h
- mnasnet1_0
- nanogpt
- resnet50
- timm_nfnet
- timm_vision_transformer
The performance comparison between H100 and MI300X are available at https://docs.google.com/spreadsheets/d/10osGq-AxJ9fMy5nE_GVh8rhxqNcDcLBevs1FQYl2MwQ/edit#gid=0
Reviewed By: nmacchioni
Differential Revision: D54344351
fbshipit-source-id: 5e2bf3700791a9401e604f8ede4b62d10f13d217