[MemDep] Optimize SortNonLocalDepInfoCache sorting strategy for large caches with few unsorted entries (#143107)
During compilation of large files with many branches, I observed that
the function `SortNonLocalDepInfoCache` in `MemoryDependenceAnalysis`
becomes a significant performance bottleneck. This is because
`Cache.size()` can be very large (around 20,000), but only a small
number of entries (approximately 5 to 8) actually need sorting. The
original implementation performs a full sort in all cases, which is
inefficient.
This patch introduces a lightweight heuristic to quickly estimate the
number of unsorted entries and choose a more efficient sorting method
accordingly.
As a result, the GVN pass runtime on a large file is reduced from
approximately 26.3 minutes to 16.5 minutes.