Make WaitDeviceOps block until device execution finishes (#4626)
* Make WaitDeviceOps block until device execution finishes
* Add comment
* Add test
* Use shared_mutex instead
* Update comment
* typo
* Remove assertNotIn in the test since it is too unstbale
* handle opbyop
* Update test_operations.py