xla
Move device lock before the execution instead of tensor gathering
#3457
Merged

Commits
  • Move device lock before the execution instead of tensor gathering
    miladm committed 4 years ago
  • Handle OpbyOP Lock
    miladm committed 4 years ago
  • moving the barrier into RunPostOrder and making changes to coll.indices.empty() condition
    miladm committed 4 years ago
  • added a conditional barrier to runpostorder to reduce the frequency of early barrier calls. WIP
    miladm committed 4 years ago
  • moved TensorCollectionBarrier into TryRunCachedSync instead of calling it under if (async != nullptr) { in SyncTensorsGraphInternal
    miladm committed 4 years ago
  • moved the barrier call to ScheduleSyncTensorsGraph and optimized the barrier call in RunPostOrder
    miladm committed 4 years ago
  • nit change
    miladm committed 4 years ago
  • Empty-Commit
    miladm committed 4 years ago
  • fixing ltc lazy api change
    miladm committed 4 years ago
  • Empty-Commit
    miladm committed 4 years ago
  • Added profiling support for RunPostOder. Added race condition caveat comment.
    miladm committed 4 years ago
  • added a missing device filter to skip calling barrier
    miladm committed 4 years ago
  • linter fix
    miladm committed 4 years ago
  • removed barrier_applied
    miladm committed 4 years ago
  • run test cleanup
    miladm committed 4 years ago
  • cleaner condition
    miladm committed 4 years ago
  • linter fix
    miladm committed 4 years ago
  • addressed feedbacks
    miladm committed 4 years ago
  • reverted tests
    miladm committed 4 years ago
  • updated toString API to new format
    miladm committed 4 years ago
Loading