Support GPU Event Operators (#3653)
* Add GPU event operators to support in-place updates in
gradient accumulator and optimizer for modifying the tensors
passing through those event operators.
* Address comment and polish code
* Merge shared code between CPU and GPU kernels
* Move event test to a new file
* Address comments
* Update onnxruntime/core/providers/cuda/gpu_data_transfer.cc