[jax:benchmark] Add tracing benchmarks for some common operations.
- For now, `jnp.dot`, `jnp.concat`, `*`.
- We also include a simple test to sense check that tracing time scales linearly with number of equations (by no means a guarantee).
- Added no_cache variants of the benchmarks.
```
name cpu/op
test_pallas_mqa_splash_attention_trace 30.73m ± ∞ ¹
test_pallas_mqa_splash_attention_trace_no_cache_clear 32.27µ ± ∞ ¹
test_pallas_mqa_splash_attention_lower 36.84m ± ∞ ¹
test_pallas_mqa_splash_attention_lower_no_cache_clear 35.62µ ± ∞ ¹
test_jnp_dot_trace 10.04m ± ∞ ¹
test_jnp_dot_trace_no_cache_clear 140.6µ ± ∞ ¹
test_jnp_concat_trace 16.41m ± ∞ ¹
test_jnp_concat_trace_no_cache_clear 205.5µ ± ∞ ¹
test_num_multiply_eqns_trace/1 16.02m ± ∞ ¹
test_num_multiply_eqns_trace/128 28.06m ± ∞ ¹
test_num_multiply_eqns_trace/256 39.44m ± ∞ ¹
test_num_multiply_eqns_trace/384 51.62m ± ∞ ¹
test_num_multiply_eqns_trace/512 62.91m ± ∞ ¹
test_num_multiply_eqns_trace/640 75.83m ± ∞ ¹
test_num_multiply_eqns_trace/768 90.09m ± ∞ ¹
test_num_multiply_eqns_trace/896 96.68m ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/1 121.2µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/128 123.6µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/256 127.0µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/384 130.9µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/512 134.6µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/640 137.5µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/768 143.4µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/896 143.9µ ± ∞ ¹
geomean 2.024m
¹ need >= 6 samples for confidence interval at level 0.95
name time/op
test_pallas_mqa_splash_attention_trace 30.78m ± ∞ ¹
test_pallas_mqa_splash_attention_trace_no_cache_clear 32.30µ ± ∞ ¹
test_pallas_mqa_splash_attention_lower 37.00m ± ∞ ¹
test_pallas_mqa_splash_attention_lower_no_cache_clear 35.66µ ± ∞ ¹
test_jnp_dot_trace 19.12m ± ∞ ¹
test_jnp_dot_trace_no_cache_clear 140.7µ ± ∞ ¹
test_jnp_concat_trace 21.84m ± ∞ ¹
test_jnp_concat_trace_no_cache_clear 205.6µ ± ∞ ¹
test_num_multiply_eqns_trace/1 24.69m ± ∞ ¹
test_num_multiply_eqns_trace/128 36.81m ± ∞ ¹
test_num_multiply_eqns_trace/256 48.04m ± ∞ ¹
test_num_multiply_eqns_trace/384 60.04m ± ∞ ¹
test_num_multiply_eqns_trace/512 72.00m ± ∞ ¹
test_num_multiply_eqns_trace/640 84.88m ± ∞ ¹
test_num_multiply_eqns_trace/768 98.77m ± ∞ ¹
test_num_multiply_eqns_trace/896 105.4m ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/1 121.2µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/128 123.7µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/256 127.1µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/384 131.0µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/512 134.8µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/640 137.5µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/768 143.5µ ± ∞ ¹
test_num_multiply_eqns_trace_no_cache_clear/896 144.1µ ± ∞ ¹
geomean 2.239m
¹ need >= 6 samples for confidence interval at level 0.95
```
PiperOrigin-RevId: 783855325