[LIBCLC][BINDLESS][CUDA] always inline redirection functs (#18699)
These functions at most do some casting, and have effectively zero
register overhead at default opt level, therefore there should be no
usage circumstance that brings a downside to always inlining.
This brings the nvptx libclc image backend in line with the amd one
which requires no such changes. amd libclc backend already does the same
thing via consistent usage of the _CLC_DECL macro for all functions.
Whilst not immediately obvious to the libclc programmer, _CLC_DECL macro
calls `__attribute__((always_inline))`.
There's a few cases that had low register usage that I've added the
`inline` hint to also, being probably overly cautious.
---------
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>