Fix invalid lowerings for ROCm in Pallas (#223)
popcount and clz were effectively broken on ROCm,
since math_dialect had incorrect lowerings.
Use the device intrinsics for these functions, as
well as for exp and absf, which fixes some accuracy issues in
the pallas tests.
Docs for OCML/OCKL
- https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/doc/OCML.md
- https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/doc/OCKL.md