Improve compiled RT-DETR inference speed (#33412)
* modify rt detr to improve inference times when compiled
* Remove redundant "to"
* Fix conditional lru_cache and missing shapes_list
* nit unnecessary list creation
* Fix compile error when ninja not available and custon kernel activated