xla
5d1c4210 - Move device lock before the execution instead of tensor gathering (#3457)

Commit

3 years ago

Move device lock before the execution instead of tensor gathering (#3457) * Move device lock before the execution instead of tensor gathering * Handle OpbyOP Lock * moving the barrier into RunPostOrder and making changes to coll.indices.empty() condition * added a conditional barrier to runpostorder to reduce the frequency of early barrier calls. WIP * moved TensorCollectionBarrier into TryRunCachedSync instead of calling it under if (async != nullptr) { in SyncTensorsGraphInternal * moved the barrier call to ScheduleSyncTensorsGraph and optimized the barrier call in RunPostOrder * nit change * Empty-Commit * fixing ltc lazy api change * Empty-Commit * Added profiling support for RunPostOder. Added race condition caveat comment. * added a missing device filter to skip calling barrier * linter fix * removed barrier_applied * run test cleanup * cleaner condition * linter fix * addressed feedbacks * reverted tests * updated toString API to new format Co-authored-by: Milad Mohammadi <milad.mo@gmail.com>

References

#3457 - Move device lock before the execution instead of tensor gathering

Author

JackCaoG

Parents

935b6024

xla 5d1c4210 - Move device lock before the execution instead of tensor gathering (#3457)

xla
5d1c4210 - Move device lock before the execution instead of tensor gathering (#3457)