[CUDNN][CUDNN V8 API] LRU Cache for cuDNN frontend `ExecutionPlan` (#104369)
Adds LRU functionality to the cuDNN frontend `ExecutionPlan` cache to address high memory usage as observed in #98688, #104122 via the `TORCH_CUDNN_V8_LRU_CACHE_LIMIT` environment variable. By default this limit is set to 10000, which corresponds to about 2GiB of host memory usage as observed empirically. Note that we are still following up with cuDNN to see if the size of an `ExecutionPlan` can be reduced, as it appears to currently be around 200KiB (!!) for a single plan.
This implementation is a bit heavy on the internal asserts for now as it's a bit difficult to directly test the state of the cache without instrumenting it explicitly in tests. Once we are confident that the implementation is stable, we can remove the asserts.
CC @malfet who @ptrblck mentioned may have also been looking into this
CC @colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104369
Approved by: https://github.com/malfet