Adding New Task SLR-Bench as a Community Task : Scalable Logical Reasoning Benchmark #983
add slr_bench evals function
512449bb
implement feedback on PR
e1add28d
remove logging and raise exception when judge not loaded
85ed4897
NathanHB
approved these changes
on 2025-09-25
NathanHB
merged
c7a063ae
into main 199 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub