onnxruntime
6cb5d3ac - Fix multi-tensor LAMB reduction to be deterministic (#6028)

Commit

5 years ago

Fix multi-tensor LAMB reduction to be deterministic (#6028) * define ordering of reduction across blocks * save state * remove debug code * remove debug code * review comments * significant correction for reduction only over blocks on same tensor * addressing ocmments * update rocm/lamb.cc to build as well * remove times 2048*size in multitensor test until threshold error in rocm resolved * convert tuple => struct as per recomendation * update comment * apply perfect forwarding for launch_multitensor to permit passing ref rather than pointer * remove excess template arguments from rocm lamb.cc launch_multitensor as well * fixes for AMD build * pr comments * run formatter from vscode * formatter on cuda files

References

#6028 - Fix multi-tensor LAMB reduction to be deterministic

Author

Suffian Khan

Parents

c8ac34d6

onnxruntime 6cb5d3ac - Fix multi-tensor LAMB reduction to be deterministic (#6028)

onnxruntime
6cb5d3ac - Fix multi-tensor LAMB reduction to be deterministic (#6028)