[autograd] fix engine flakiness (#35599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35599
We don't check if the ready queue was empty before
https://github.com/pytorch/pytorch/pull/33157 because the CPU worker's
queue might not be empty, but after #33157, we try to check if the owner
thread's ready_queue empty after inline exeuction.
This might not always hold true, imagine the following case:
The CPU thread that calls backward() and the GPU device thread, the Graph is like:
GraphRoot(CPU) -> ComputeNode(GPU)
in both thread_main, they are decrementing `--local_graph_task->outstanding_tasks_` to zero together, and then both thread will enter `if (graph_task_completed(local_graph_task))`, CPU thread will break out and finish and check if local_ready_queue is empty, the GPU thread will send a dummy task to CPU thread ready queue as it think the graph_task finished on its own thread (it actually finished on both threads together). So there will be cases that there's a dummy task remains in the queue.
This happens very rare and non-deterministic, but it might get triggered when we run many jobs in the CI. Remove the check to fix the flakiness
Test Plan: Imported from OSS
Differential Revision: D20739778
Pulled By: wanchaol
fbshipit-source-id: 75a671762650a188f44720625d53f0873617c684