Refactor CLIP-like models (#44431)
* squash!
* fix copies for losses
* reverse mapping test
* xclip
* fxi repo
* altclip needs eager attn to pass the test?! and ofc non-causal masl
* final fix repo
* layernorm typo / docsting
* clips can't agree on causality
* ugh, skip xclip
* fix repo
* comments
* and more
* fixing stuff
* i didn't push it yesterday?
* style
* break out of infinte dependency loop
* fxi repo and hopefully merge today
* again