SemanticDiff

pytorch
79e30ff3 - optimize index_select performance on CPU with TensorIterator (#30598)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

4 years ago

optimize index_select performance on CPU with TensorIterator (#30598) Summary: This PR aims at improving `index_select` performance on CPU with `TensorIterator`. The code has equally effective optimization for both contiguous tensor and non-contiguous tensor. The code will try to parallel inner loop in case the slice of copy is large enough, otherwise it will parallel on outer loop. Thus both the user scenarios from DLRM (from `Embedding`) and Fairseq transformer is covered. 1. for contiguous input, single socket: **1.25x** performance speedup 2. for non-contiguous input, single socket: **799x** performance speedup 3. for contiguous input, single core: same performance 4. for non-contiguous input, single core: **31x** performance speedup Pull Request resolved: https://github.com/pytorch/pytorch/pull/30598 Differential Revision: D19266892 Pulled By: VitalyFedyunin fbshipit-source-id: 7aaf8e2c861b4a96250c968c4dd95c8d2c5b92d7

Author

mingfeima

mingfeima

Committer

facebook-github-bot

facebook-github-bot

Parents

FAQ Terms Privacy Refunds Impressum

Loading