benchmark
Update run.py to warmup and report cuda timings
#404
Merged

Loading