More efficient cache-key globbing + support parent paths in globs (#13469)
## Summary
(Related PR: #13438 - would be nice to have it merged as well since it
touches on the same globwalker code)
There's a few issues with `cache-key` globs, which this PR attempts to
address:
- As of the current state, parent or absolute paths are not allowed,
which is not obvious and is not documented. E.g., cache-key paths of the
form `{file = "../dep/**"}` will be essentially ignored.
- Absolute glob patterns also don't work (funnily enough, there's logic
in `globwalk` itself that attempts to address it in
[`globwalk::glob_builder()`](https://github.com/Gilnaa/globwalk/blob/8973fa2bc560be54c91448131238fa50d56ee121/src/lib.rs#L415),
which serves as inspiration to some parts of this PR).
- The reason for parent paths being ignored is the way globwalker is
currently being triggered in `uv-cache-info`: the base directory is
being walked over completely and each entry is then being matched to one
of the provided match patterns.
- This may also end up being very inefficient if you have a huge root
folder with thousands of files: if your match patterns are `a/b/*.rs`
and `a/c/*.py` then instead of walking over the root directory, you can
just walk over `a/b` and `a/c` and match the relevant patterns there.
- Why supporting parent paths may be important to the point of being a
blocker: in large codebases with python projects depending on other
local non-python projects (e.g. rust crates), cache-keys can be very
useful to track dependency on the source code of the latter (e.g.
`cache-keys = [{ file = "../../crates/some-dep/**" }]`.
- TLDR: parent/absolute cache-key globs don't work, glob walk can be
slow.
## Solution
- In this PR, user-provided glob patterns are first clustered
(LCP-style) into pattern groups with longest common path prefix; each of
these groups can then be walked over separately.
- Pattern groups do not overlap, so we would never walk over the same
directory twice (unless there's symlinks pointing to same folders).
- Paths are not canonicalized nor virtually normalized (which is
impossible on Unix without FS access), so the method is symlink-safe
(i.e. we don't treat `a/b/..` as `a`) and should work fine with #13438.
- Because of LCP logic, the minimal amount of directory space will be
traversed to cover all patterns.
- Absolute glob patterns will now work.
- Parent-relative glob patterns will now work.
- Glob walking will be more efficient in some cases.
## Possible improvements
- Efficiency can be further greatly improved if we limit max depth for
globwalk. Currently, a simple ".toml" will deep-traverse the whole
folder. Essentially, max depth can be always set to either N or
infinity. If a pattern at a pivot node contains `**`, we collect all
children nodes from the subtree into the same group and don't limit max
depth; otherwise, we set max depth to the length of the glob pattern.
This wouldn't change correctness though and can we done separately if
needed.
- If this is considered important enough, docs can be updated to
indicate that parent and absolute globs are supported (and symlinks are
resolved, if the relevant PR is algo merged in).
## Test Plan
- Glob splitting and clustering tests are included in the PR.
- Relative and absolute glob cache-keys were tested in an actual
codebase.