4D `attention_mask` support (#27539)
* edits to _prepare_4d_causal_attention_mask()
* initial tests for 4d mask
* attention_mask_for_sdpa support
* added test for inner model hidden
* added autotest decorators
* test mask dtype to torch.int64
* torch.testing.assert_close
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* torch_device and @torch_gpu in tests
* upd tests
* +torch decorators
* torch decorators fixed
* more decorators!
* even more decorators
* fewer decorators
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>