[Research Projects] ORPO diffusion for alignment (#7423)
* barebones orpo
* remove reference model.
* full implementation
* change default of beta_orpo
* add a training command.
* fix: dataloading issues.
* interpreting the formulation.
* revert styling
* add: wds full blown version
* fix: per_gpu_batch_siz
* start debuggin
* debugging
* remove print
* fix
* remove filter keys.
* turn on non-blocking calls.
* device_placement
* let's see.
* add bigger training run command
* reinitialize generator for fair repro
* add: detailed readme and requirements
---------
Co-authored-by: Sayak Paul <sayakpaul@Sayaks-MacBook-Pro-2.local>