[vulkan] Improve mutex management when syncing with GPU (#80959)
Improves mutex management when syncing with the GPU.
The main way dispatches are recorded to a `vulkan::api::Context` instance is through the `submit_compute_job` and `submit_texture_copy` functions which locks the `cmd_mutex_` mutex, which serializes dispatches when the instance is being accessed from multiple threads.
Complexities arise when syncing with the GPU. The way things go is
```
// Record a shader dispatch to copy data from image texture to a buffer
// and call vkQueueSubmit with a fence
context->submit_compute_job(image_to_nchw, fence...)
// Wait on the fence
fence.wait()
// Flush the context
context->flush()
```
Between calling `vkQueueSubmit` with a fence and flushing `context`, `context` cannot allow more dispatches to be recorded or the resources used for those dispatches will be erased during the call the `flush()`.
Previously, this was managed by having `submit_compute_job` lock the mutex but release it if a fence was passed, and having `flush()` release the mutex at the end of the function call. However, this method is rather confusing and also does not properly account for exceptions that arise between the calls to `submit_compute_job()` and `flush()`.
This diff changes it so that the calling thread manually manages `context->cmd_mutex_` when syncing with the GPU.
Differential Revision: [D37616998](https://our.internmc.facebook.com/intern/diff/D37616998/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80959
Approved by: https://github.com/kimishpatel