[NVPTX] Add Intrinsics for discard.* (#128404)
[NVPTX] Add Intrinsics for discard.*
This PR adds intrinsics for all variations of discard.*
* These intrinsics supports generic or global for all variations.
* The lowering is handled from nvvm to nvptx tablegen directly.
* Lit tests are added as part of discard.ll
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst.
For more information, refer to the PTX ISA
<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-discard>_.
---------
Co-authored-by: abmajumder <abmajumder@nvidia.com>
Co-authored-by: gonzalobg <65027571+gonzalobg@users.noreply.github.com>