Fix sigmoid transformation in TreeEnsembleClassifier for all-positive weights with LOGISTIC post_transform (#27536)
### Description
`TreeEnsembleClassifier` with `post_transform=LOGISTIC` was not applying
the sigmoid transformation when all tree leaf weights are non-negative.
This manifests for binary classifiers where every tree is a single leaf
node (no splits), a valid degenerate case produced by XGBoost when
training data is too small to learn splits.
The following fixes were made:
- **`tree_ensemble_aggregator.h` — `_set_score_binary()`**: The
`weights_are_all_positive_` field and its associated fast path (cases
0/1, threshold at 0.5) have been removed entirely from the classifier.
The classifier now always uses the logit-threshold path (cases 2/3,
threshold at 0), which correctly applies sigmoid for `LOGISTIC`
post-transform regardless of whether leaf weights are non-negative.
- **`ml_common.h` — `write_scores()`**: For cases 0/1, apply
`ComputeLogistic` (sigmoid) when `post_transform == LOGISTIC` instead of
the raw `[1 - score, score]` output. This is a defense-in-depth fix for
other callers such as SVMClassifier.
- **`ml_common.h` — `batched_update_scores_inplace()`**: Same fix for
cases 0/1 in the batched code path used by SVMClassifier.
- **Regression test**: Added
`TreeEnsembleClassifierBinaryLogisticAllPositiveWeights` in
`tree_ensembler_classifier_test.cc`, covering a single-leaf tree
(all-positive weights) with `post_transform=LOGISTIC` for both positive
and negative aggregate score cases.
### Motivation and Context
When converting XGBoost binary:logistic models to ONNX, trees with no
splits (leaf-only) produce only non-negative leaf weights, setting
`weights_are_all_positive_ = true`. In this state, `_set_score_binary`
assigned `write_additional_scores` to 0 or 1, causing `write_scores` to
output `[1 - score, score]` without sigmoid — incorrect for a LOGISTIC
post-transform. Trees with real splits (mixed positive/negative weights)
set `weights_are_all_positive_ = false`, correctly routing through the
sigmoid path. This caused major score mismatches when upgrading from
XGBoost 1.7.2 to XGBoost 3 with small training datasets.
The root cause has been fully addressed by removing
`weights_are_all_positive_` from the tree ensemble classifier code path
entirely. The `ml_common.h` changes remain as defense-in-depth fixes for
other callers (e.g. SVMClassifier) that may still set `add_second_class`
to 0 or 1.
<!-- START COPILOT ORIGINAL PROMPT -->
<details>
<summary>Original prompt</summary>
----
*This section details on the original issue you should resolve*
<issue_title>TreeEnsembleClassifier with post_transform=LOGISTIC skips
sigmoid for leaf-only trees when all weights are
non-negative</issue_title>
<issue_description>## Describe the issue
We have code which converts xgboost models to ONNX. We were seeing major
score mismatches when attempting to upgrade from xgboost 1.7.2 to
xgboost 3. With the help of AI, I cloned down the source code for the
various OS repos involved. AI believes the issue lives in the
onnxruntime itself and was the result of our training dataset being too
small to produce splits. To work around this, we just increased the size
of our test dataset. But I figured it's worth opening a bug report about
this edge case. The AI generated bug report is as follows:
`TreeEnsembleClassifier` with `post_transform=LOGISTIC` does not apply
the sigmoid transformation when **all tree weights are non-negative**
(`weights_are_all_positive_ = true`). This happens for binary
classifiers where every tree is a single leaf node (no splits), which is
a valid degenerate case produced by XGBoost when training data is too
small to learn splits.
**Expected behavior**: The second output (class scores) should contain
post-transformed probabilities: `[1 - sigmoid(agg), sigmoid(agg)]`
**Actual behavior**: The second output contains raw scores without
sigmoid: `[1 - agg, agg]`
The bug is in the interaction between `_set_score_binary` in
`tree_ensemble_aggregator.h` and `write_scores` in `ml_common.h`. When
`weights_are_all_positive_ == true`, `_set_score_binary` sets
`add_second_class` to 0 or 1. The `write_scores` function only applies
LOGISTIC (sigmoid) for `add_second_class` values 2 and 3 — values 0 and
1 output `[1-score, score]` without sigmoid.
Trees with real splits (which produce both positive and negative
weights) correctly set `weights_are_all_positive_ = false`, causing
`add_second_class = 2 or 3`, and the sigmoid IS applied. So the bug only
manifests for leaf-only trees.
### Source code references
1. **`tree_ensemble_aggregator.h` — `_set_score_binary()`**: When
`weights_are_all_positive_ == true`, overwrites
`write_additional_scores` to 0 or 1:
```cpp
if (weights_are_all_positive_) {
if (pos_weight > 0.5) {
write_additional_scores = 0; // <-- bypasses sigmoid
return class_labels_[1];
} else {
write_additional_scores = 1; // <-- bypasses sigmoid
return class_labels_[0];
}
}
```
2. **`ml_common.h` — `write_scores()`**: Only applies LOGISTIC for
`add_second_class` 2 and 3:
```cpp
switch (add_second_class) {
case 0:
case 1:
// Raw score output — NO sigmoid applied
scores.push_back(scores[0]);
scores[0] = 1 - scores[0];
break;
case 2:
case 3:
if (post_transform == POST_EVAL_TRANSFORM::LOGISTIC) {
// Sigmoid IS applied here
scores[1] = ComputeLogistic(scores[0]);
scores[0] = ComputeLogistic(-scores[0]);
}
break;
}
```
3. **`tree_ensemble_common.h`**: `weights_are_all_positive_` is set to
`true` when all `class_weights` values are non-negative. Leaf-only
XGBoost binary:logistic trees always have non-negative weights.
## To reproduce
Minimal self-contained reproducer:
```python
import numpy as np
import onnxruntime
from onnx import TensorProto, helper
print(f"onnxruntime version: {onnxruntime.__version__}")
X = helper.make_tensor_value_info("X", TensorProto.FLOAT, [None, 3])
label_out = helper.make_tensor_value_info("label", TensorProto.INT64, [None])
prob_out = helper.make_tensor_value_info("probs", TensorProto.FLOAT, [None, 2])
def make_model(nodes_modes, nodes_values, nodes_truenodeids, nodes_falsenodeids,
class_treeids, class_nodeids, class_weights, **node_kwargs):
"""Build a minimal TreeEnsembleClassifier ONNX model."""
n_nodes = len(nodes_modes)
node = helper.make_node(
"TreeEnsembleClassifier",
inputs=["X"],
outputs=["label", "probs"],
domain="ai.onnx.ml",
nodes_treeids=[0] * n_nodes,
nodes_nodeids=list(range(n_nodes)),
nodes_featureids=[0] * n_nodes,
nodes_values=nodes_values,
nodes_modes=nodes_modes,
nodes_truenodeids=nodes_truenodeids,
nodes_falsenodeids=nodes_falsenodeids,
nodes_missing_value_tracks_true=[0] * n_nodes,
nodes_hitrates=[1.0] * n_nodes,
class_treeids=class_treeids,
class_nodeids=class_nodeids,
class_ids=[0] * len(class_weights),
class_weights=class_weights,
classlabels_int64s=[0, 1],
base_values=[-0.405], # logit(0.4)
post_transform="LOGISTIC",
**node_kwargs,
)
graph = helper.make_graph([node], "test", [X], [label_out, prob_out])
return helper.make_model(graph, opset_imports=[
helper.make_opsetid("", ...
</details>
<!-- START COPILOT CODING AGENT SUFFIX -->
- Fixes microsoft/onnxruntime#27533
<!-- START COPILOT CODING AGENT TIPS -->
---
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Co-authored-by: Xavier Dupré <xadupre@microsoft.com>
Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>