Allow user to assert no mask contiguous check is necessary (#82533)
Summary:
Allow user to assert no mask contiguous check is necessary:
(1) Prevents sync event which will disrupt CUDA Graph collection, and
(2) offers slightly better performance by avoid a sync
This needs to be a separate opt-in option because we change behavior of malformed masks. It's the only way to get BT into CUDA Graph based on what I understood about CUDA Graph collection from ngimel.
Test Plan: sandcastle unit tests
Differential Revision: D38040418
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82533
Approved by: https://github.com/jbschlosser, https://github.com/zrphercule