ruff
c71e1696 - [ty] Implement Duboc's TDD optimization for unions of constraint sets (#23881)

Commit
4 days ago
[ty] Implement Duboc's TDD optimization for unions of constraint sets (#23881) For awhile we've known that our constraint sets can balloon in size when they involve large unions, and especially intersections of large unions. And we've had ecosystem runs (typically involving projects that depend on numpy) that trigger this pathological behavior. #23848 is the latest example, with a 25× performance regression for the `mesonbuild` project. Guillaume Duboc just defended his [PhD thesis](https://gldubc.github.io/#thesis) in January, and in §11.2, he calls out an optimization first introduced by Frisch for handling these kinds of unions more efficiently. The approach is also described in [a post](https://elixir-lang.org/blog/2025/12/02/lazier-bdds-for-set-theoretic-types/) on the Elixir blog. (Frisch and Duboc are both using these BDDs to represent _types_, whereas we're using them to represent _constraints on types_, but the same ideas apply.) Frisch describes the basic idea, which is to add an "uncertain" branch to each BDD node, turning them into ternary decision diagrams (TDDs). Frisch also provides TDD construction rules for union (OR), intersection (AND), and difference. Duboc takes this further and derives more efficient rules for intersection and difference. This PR implements TDDs and Frisch's and Duboc's construction rules. I've confirmed that this completely eliminates the performance regression for `mesonbuild` on #23848. --- More details on how this works, and why we get these savings: The key benefit is that they let us represent unions more efficiently. As a simple example, with our quasi-reduced BDDs from before, `a ∨ b` becomes: ``` <n1> a ┡━₁ <n2> b │ ┡━₁ true │ └─₀ true └─₀ <n3> b ┡━₁ true └─₀ false ``` With TDDs, the rhs of `a ∨ b` is moved into the new "uncertain" branch: ``` <t1> a ┡━₁ true ├─? <t2> b │ ┡━₁ true │ ├─? false │ └─₀ false └─₀ false ``` We already have some savings, since the TDD representation "allows unions to be kept lazy, postponing expansion until needed for intersection or difference operations". We "park" the rhs as-is into the uncertain branch, so we only need one (existing) copy of it. In the BDD case, we had to fold the rhs into the `a = true` case, creating an entire (modified) copy of its subgraph. That means we only need 2 nodes for the TDD instead of 3 for the BDD. (With only two variables, this might not seem like a lot, but we've actually gone from O(n²) nodes to O(n).) We get even more savings when with more complex formulas, like `(a ∨ b) ∧ (c ∨ d)`. With BDDs, we get: ``` <n1> a <n7> a ┡━₁ <n2> b ┡━₁ <n8> b │ ┡━₁ true │ ┡━₁ <n4> c │ └─₀ true │ │ ┡━₁ <n5> d └─₀ <n3> b │ │ │ ┡━₁ true ┡━₁ true │ │ │ └─₀ true └─₀ false │ │ └─₀ <n6> d │ │ ┡━₁ true <n4> c │ │ └─₀ false ┡━₁ <n5> d │ └─₀ <n4> SHARED │ ┡━₁ true └─₀ <n9> b │ └─₀ true ┡━₁ <n4> SHARED └─₀ <n6> d └─₀ false ┡━₁ true └─₀ false ``` With TDDs, we get: ``` <t1> a <t5> a ┡━₁ true ┡━₁ <t3> c ├─? <t2> b │ ┡━₁ true │ ┡━₁ true │ ├─? <t4> d │ ├─? false │ │ ┡━₁ true │ └─₀ false │ │ ├─? false └─₀ false │ │ └─₀ false │ └─₀ false <t3> c ├─? <t6> b ┡━₁ true │ ┡━₁ <t3> SHARED ├─? <t4> d │ ├─? false │ ┡━₁ true │ └─₀ false │ ├─? false └─₀ false │ └─₀ false └─₀ false ``` That's 7 nodes for the BDDs, and 4 for TDDs — still linear in the total number of variables, even though our BDDs are only quasi-reduced. And also note that we never had to modify the TDD that represented the rhs of the AND (`t3 = c ∨ d`).
Author
Parents
Loading