benchmark
926fb10a - Add HSTU ragged attention operator (#2453)

Commit

1 year ago

Add HSTU ragged attention operator (#2453) Summary: As the title says. On H100: ``` $ python run_benchmark.py triton --op ragged_attention x_val hstu_triton_ragged_attention-latency hstu_triton_ragged_attention_persistent-latency ----------------- -------------------------------------- ------------------------------------------------- (8, 4, 512, 2048) 0.0141706 0.0128713 (8, 4, 512, 2048) 0.0187315 0.0171204 (8, 4, 512, 2048) 0.0156807 0.0155399 (8, 4, 512, 2048) 0.0165724 0.0154679 (8, 4, 512, 2048) 0.0163886 0.0157738 (8, 4, 512, 2048) 0.0173378 0.0155991 (8, 4, 512, 2048) 0.0164874 0.0153128 (8, 4, 512, 2048) 0.0203275 0.0172193 (8, 4, 512, 2048) 0.0214526 0.0185414 (8, 4, 512, 2048) 0.0172307 0.0169625 ``` Pull Request resolved: https://github.com/pytorch/benchmark/pull/2453 Reviewed By: manman-ren Differential Revision: D62513596 Pulled By: xuzhao9 fbshipit-source-id: 154ef0145ca94ecfeb0b075c9dec01b395683ef2

Author

xuzhao9

Committer

facebook-github-bot

Parents

0ab0e47e

benchmark 926fb10a - Add HSTU ragged attention operator (#2453)

benchmark
926fb10a - Add HSTU ragged attention operator (#2453)