pytorch
5ce2ab84 - [cuda] Preserve operations order between vectorized and non-vectorized in ln grad input (#111488)

Commit
1 year ago
[cuda] Preserve operations order between vectorized and non-vectorized in ln grad input (#111488) The vectorized implementation in https://github.com/pytorch/pytorch/pull/111021 changed the order of arithmetic instructions in `layer_norm_grad_input`, causing non bitwise identical results when compared to the non-vectorized implementation. At merging, all accuracy checks passed, including internal inductor ones. There are CI periodic inductor dynamo tests (e.g. `pit_b_224`) that run eager mode models several times and compare results. If the input buffers are aligned to the vector length, the vectorized implementation will be used. If not, the default one will be used. If the 2 eager runs end up having different buffer alignments, 2 implementations will be called and then the results would be very close but not bitwise identical. The tests check for bitwise identical results and in some cases they may fail. This fix makes sure that the operation order between non-vectorized and vectorized is the same and the 2 implementations **should** produce bitwise identical results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111488 Approved by: https://github.com/malfet
Committer
Parents
Loading