benchmark
926fb10a - Add HSTU ragged attention operator (#2453)

Commit
1 year ago
Add HSTU ragged attention operator (#2453) Summary: As the title says. On H100: ``` $ python run_benchmark.py triton --op ragged_attention x_val hstu_triton_ragged_attention-latency hstu_triton_ragged_attention_persistent-latency ----------------- -------------------------------------- ------------------------------------------------- (8, 4, 512, 2048) 0.0141706 0.0128713 (8, 4, 512, 2048) 0.0187315 0.0171204 (8, 4, 512, 2048) 0.0156807 0.0155399 (8, 4, 512, 2048) 0.0165724 0.0154679 (8, 4, 512, 2048) 0.0163886 0.0157738 (8, 4, 512, 2048) 0.0173378 0.0155991 (8, 4, 512, 2048) 0.0164874 0.0153128 (8, 4, 512, 2048) 0.0203275 0.0172193 (8, 4, 512, 2048) 0.0214526 0.0185414 (8, 4, 512, 2048) 0.0172307 0.0169625 ``` Pull Request resolved: https://github.com/pytorch/benchmark/pull/2453 Reviewed By: manman-ren Differential Revision: D62513596 Pulled By: xuzhao9 fbshipit-source-id: 154ef0145ca94ecfeb0b075c9dec01b395683ef2
Author
Parents
Loading