xla
Use f32 scratch for output so we only need to transfer output with desired dtype back to HBM.
#8924
Merged

Use f32 scratch for output so we only need to transfer output with desired dtype back to HBM. #8924

vanbasten23 merged 4 commits into master from xiowei/migrate_kernel_change
vanbasten23
vanbasten23 Use f32 scratch for output so we only need to transfer output with de…
4e100747
vanbasten23 not to use dynamic grid
e2cebd5b
vanbasten23
vanbasten23 linter
6a53e0a9
vanbasten23 vanbasten23 marked this pull request as ready for review 1 year ago
bythew3i
bythew3i approved these changes on 2025-04-02
vanbasten23
vanbasten23 change torch_xla wrapper
9383cbc5
yaochengji
yaochengji approved these changes on 2025-04-02
vanbasten23
vanbasten23
vanbasten23 vanbasten23 merged f0881b5a into master 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone