pytorch
dc6916b3 - optimize gather performance for gnn usage on CPU (#87586)

Commit View On GitHub

Commit

1 year ago

optimize gather performance for gnn usage on CPU (#87586) On classic pyg user case for message passing, `gather` has `index` tensor in a broadcasted shape, e.g. with shape `5000, 128` and stride `[1, 0]`. That indicated gather is done on each row of the self tensor. The current implementation will try to parallel on the inner dimension which is bad performance for CPU and unable to be vectorized. This PR addressed this use case and optimize in a similar manner to index_select, parallel on outer dimension of `index` and do vectorized copy on inner dimension. Performance benchmarking on Xeon Icelake single socket on `GCN`: the `gather` reduced from `150.787ms` to `10.926ms`, after this optimization, `gather` will no longer be the major bottleneck for training of GNN models when `EdgeIndex` is in COO format. for more details, please refer to https://github.com/pyg-team/pytorch_geometric/issues/4891#issuecomment-1288423705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87586 Approved by: https://github.com/rusty1s, https://github.com/malfet

Author

mingfeima

Committer

pytorchmergebot

Parents

f8026413

pytorch dc6916b3 - optimize gather performance for gnn usage on CPU (#87586)

Commit

pytorch
dc6916b3 - optimize gather performance for gnn usage on CPU (#87586)