Add stream synchronization barrier for cross-device events
cuEventSynchronize alone is insufficient - it only blocks CPU.
Need to create ordering barrier in target stream so work enqueued
after this call executes after the cross-device event completes.
Use cuStreamSynchronize to establish this barrier.