Paligemma causal attention mask (#30967)
* PaliGemma working causal attention
* Formatting
* Style
* Docstrings + remove commented code
* Update docstring for PaliGemma Config
* PaliGemma - add separator ind to model/labels
* Refactor + docstring paligemma processor method
* Style
* return token type ids when tokenizing labels
* use token type ids when building causal mask
* add token type ids to tester
* remove separator from config
* fix style
* don't ignore separator
* add processor documentation
* simplify tokenization
* fix causal mask
* style
* fix label propagation, revert suffix naming
* fix style
* fix labels tokenization
* [run-slow]paligemma
* add eos if suffixes are present
* [run-slow]paligemma
* [run-slow]paligemma
* add misssing tokens to fast version
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix style
* [run-slow]paligemma
---------
Co-authored-by: Peter Robicheaux <peter@roboflow.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>