auto-round
fb169168 - test(ark): prefill perf at 2K/4K seq len with warmup + averaging

Commit

9 days ago

test(ark): prefill perf at 2K/4K seq len with warmup + averaging Update test/test_ark/test_moe_model_perf.py: - Drop the single 128-token prefill prompt; benchmark prefill at seq_len 2048 and 4096 instead, and surface seq_len in the printed table. - Add an explicit warmup phase (_TIMING_WARMUP=3) before the timed loop so the XPU runtime/JIT/caches are primed. - Run more timed iterations (_TIMING_REPEATS=5) and report the arithmetic mean (with the slowest sample trimmed) instead of a single median, for steadier numbers across runs. - Update _bench_one/_format_row/_print_header and the docstring accordingly; FP/ARK/GPTQModel rows now emit one row per seq_len.

References

#1813 - Add moe prefill/ decode with int2/int4/int8 sym /asym and fp8 e4m3 e5m2

Author

Copilot

Parents

55026244

auto-round fb169168 - test(ark): prefill perf at 2K/4K seq len with warmup + averaging

auto-round
fb169168 - test(ark): prefill perf at 2K/4K seq len with warmup + averaging