cuda: Q1_0 initial backend (#21629)
* [cuda] initial Q1_0 backend
* remove unused code, fix AMD MMA guard
* attempt to support dp4a
* Apply suggestions from code review
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>