[minifier] cuda.synchronize to better detect IMA (#97962)
Sometimes IMA can trigger much later than the kernel invocation call, and they escape minifier. Calling cuda.synchronize fixes this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97962
Approved by: https://github.com/mlazos