Set --xla_latency_hiding_scheduler_rerun to 1
Summary:
This flag will rerun the latency hidding scheduler if the default
shared memory limit 95% leads to OOM. Each rerun will choose a value
0.9x of the previous run, and the number of rerun is set to 1 now.
Shared memory limit refers to --xla_tpu_scheduler_percent_shared_memory_limit.
Lower shared memory limit means less communiation and computation overlapping,
and thus worse performance.
Test Plan:
Tested on Llama 2 7B on V4-32.