SemanticDiff

pytorch
8e391c73 - use 4 warps for small block config in mm (#95339)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

1 year ago

use 4 warps for small block config in mm (#95339) Temporary Fix for #95312 In triton, 1 warp computes 16x16 tile of output, so for 32x32 block we only need 4 warps. 8 warps IMA, which is a bug, but it's not a good config anyway. Triton main is supposed to have better behavior for these pathological, but we are not on main yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95339 Approved by: https://github.com/ezyang, https://github.com/Chillee

Author

ngimel

ngimel

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading