Remerge custom gpu op (#5818)
* add case for cpu custom op on gpu
* format doc
* restrict GPU custom op on Linux GPU CI only
* separate cu file to a independent project
* fix typo
* include cuda_add lib
* move lib def
* add file header
Co-authored-by: RandySheriffH <rashuai@microsoft.com>