Fix perf issue in Conv CUDA kernel (#7348)
* Fix perf issue in Conv CUDA kernel
* Read avaiable memory from device
* assuming 10% fragmentation
Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>