llvm-project
fdc4274e - AMDGPU: Perform libcall recognition to replace fast OpenCL pow (#182135)

Commit
61 days ago
AMDGPU: Perform libcall recognition to replace fast OpenCL pow (#182135) If a float-typed call site is marked with afn, replace the 4 flavors of pow with a faster variant. This transforms pow, powr, pown, and rootn to __pow_fast, __powr_fast, __pown_fast, and __rootn_fast if available. Also attempts to handle all of the same basic folds on the new fast variants that were already performed with the base forms. This maintains optimizations with OpenCL when the device libs unsafe math control library is deleted. This maintains the status quo of how libcalls work, and only handles 4 new entry points. This only helps with the elimination of the control library, and not general libcall emission problems. This makes no practical difference for HIP, which is the status quo for libcall optimizations. AMDGPULibCalls recognizes the OpenCL mangled names. e.g., OpenCL float "pow" is really _Z3powff but the HIP provided function "powf" is really named _ZL4powfff, and std::pow with float is _ZL3powff. The pass still runs for HIP, so by accident if you used the OpenCL mangled function names, this would trigger. Since the functions cannot yet be relied on from the library, introduce a temporary module flag check. I'm not planning on emitting it anywhere and it's a poor substitute for versioning the target.
Author
Parents
Loading