auto-round
fb169168 - test(ark): prefill perf at 2K/4K seq len with warmup + averaging

Commit
9 days ago
test(ark): prefill perf at 2K/4K seq len with warmup + averaging Update test/test_ark/test_moe_model_perf.py: - Drop the single 128-token prefill prompt; benchmark prefill at seq_len 2048 and 4096 instead, and surface seq_len in the printed table. - Add an explicit warmup phase (_TIMING_WARMUP=3) before the timed loop so the XPU runtime/JIT/caches are primed. - Run more timed iterations (_TIMING_REPEATS=5) and report the arithmetic mean (with the slowest sample trimmed) instead of a single median, for steadier numbers across runs. - Update _bench_one/_format_row/_print_header and the docstring accordingly; FP/ARK/GPTQModel rows now emit one row per seq_len.
Author
Parents
Loading