[flang][cuda] Remove the need of special compile definition for CUFInit (#124965)
This patch addresses post commit review comments from #124859.
The extra compile definition is not necessary and goes against the
effort to separate the runtimes from the flang compiler itself. The
function declaration for `CUFInit` can be accessed anyway since the
header are always present. The insertion of the call is only based on
the language feature options from the folding context.
A program compiled with cuda enabled but no cufruntime would just fail
at link time as expected.