Add model lerobot PI0 to transformers (#44160)
* let it be, as one said
* smaller ish inference
* some stuff
* update
* fixes
* style
* fixes
* notation
* improve testing a bit
* fi xtests
* fixes and not obvious changes
* revert the last commit on capture outputs and save while it works
* make it a proper DiT + VLM model
* fix position ids
* the processing is usper weird, why is the PAD between image and text tokens?
* delete prints
* image can actually be padded, gotta adjust then
* move time embeds to its own module
* and the mask, Ig like this fir Pi0
* fix a few tests
* fix repo
* update
* docstring and change all siimilar utils
* update
* docs
* fix the rest
* scaling
* oops, should be zeros
* pii0 pads images!
* skip
* let's adapt the whole processing pipe, including actions and state
* update
* add dummy training test
* smaller lr
* comments
* copy expected output from runners
* docstring
* didn't aadd PI0-FAST
* repo-check is starting to be annoying
* docs - i hope the last fix
---------
Co-authored-by: raushan <raushan@huggingface.co>