Process inputs directly in apply_chat_template in image-text-to-text pipeline (#35616)
* tokenize inputs directly in apply_chat_template
* refactor processing
* revert changes processing llava
* Update docs
* fix issue with str being iterable
* add test chat text only
* change function name