onnxruntime
ecf164a9 - Reduce GQA cpu test combinations (#26897)

Commit
53 days ago
Reduce GQA cpu test combinations (#26897) The current testing strategy for GQA on CPU attempts to run a Cartesian product of all configuration parameters (batch size, sequence length, rotary embeddings, packed KV, softcap, etc.), leading to over 2000 test combinations. This causes significant runtime overhead and potential timeouts. This PR optimizes `test_gqa_cpu.py` by: - Replacing the nested loop over all parameters with a round-robin selection strategy (`combo_index`). - Significantly reducing the number of test cases (from ~2304 to ~32 in pipeline mode) while maintaining coverage of individual features (rotary, packed, softcap, etc.). This ensures the test suite remains robust but much faster. It reduces test time from minutes to seconds, and saves lot of compute resource in CI pipeline.
Author
Parents
Loading