Fix VideoPrismForVideoClassification returning last_hidden_state as h… (#46830)
* Fix VideoPrismForVideoClassification returning last_hidden_state as hidden_states
ImageClassifierOutput.hidden_states must be tuple | None, but the model
was passing vision_model_outputs.last_hidden_state (a raw tensor). Also
adds the missing attentions field and updates tests to assert correct behavior.
* Test recorded hidden_states/attentions for VideoPrismForVideoClassification
Replace misleading skips with real tests: now that hidden_states/attentions
are correctly forwarded from the vision backbone, add custom tests mirroring
VideoPrismVisionModel that assert the spatial-then-temporal shapes when
output_hidden_states/output_attentions are requested.