transformers
add VisionTextDualEncoder and CLIP fine-tuning script
#15701
Merged

Loading