onnxruntime
ecf164a9 - Reduce GQA cpu test combinations (#26897)

Commit

185 days ago

Reduce GQA cpu test combinations (#26897) The current testing strategy for GQA on CPU attempts to run a Cartesian product of all configuration parameters (batch size, sequence length, rotary embeddings, packed KV, softcap, etc.), leading to over 2000 test combinations. This causes significant runtime overhead and potential timeouts. This PR optimizes `test_gqa_cpu.py` by: - Replacing the nested loop over all parameters with a round-robin selection strategy (`combo_index`). - Significantly reducing the number of test cases (from ~2304 to ~32 in pipeline mode) while maintaining coverage of individual features (rotary, packed, softcap, etc.). This ensures the test suite remains robust but much faster. It reduces test time from minutes to seconds, and saves lot of compute resource in CI pipeline.

References

#26897 - Reduce GQA cpu test combinations

Author

tianleiwu

Parents

0d59f8d9

onnxruntime ecf164a9 - Reduce GQA cpu test combinations (#26897)

onnxruntime
ecf164a9 - Reduce GQA cpu test combinations (#26897)