[NVFuser] always use fallback if fusion fails
1) remember when fusions fail; and on subsequent runs, always take the fallback.
2) during the first fallback, cache the Code object.
On autogen-69 from the nvfuser microbenchmarks (https://github.com/pytorch/benchmark/pull/801) this improved performanance as follows:
* Original (always attempt fusion): 25ms
* Always take fallback after first failure: 0.79ms
* Always take fallback + cache Code object: 0.62ms
* Eager: 0.58ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75983
Approved by: https://github.com/jjsjann123