perf: use deque for FIFO queues in sequence parallel, superoffload, and compile (#7880)
## Problem
Three files use `.pop(0)` for FIFO queue processing, which is **O(n)**
per removal:
1. `ulysses_sp.py`: Micro-batch queue for sequence parallel data
sharding
2. `superoffload_stage3.py`: Parameter buffer for IPG gradient bucketing
3. `compile/backend.py`: Compile pass schedule queue
## Solution
Switch to `collections.deque` with `.popleft()` for **O(1)** front
removal.
## Changes
| File | Pattern |
|------|---------|
| `deepspeed/runtime/sequence_parallel/ulysses_sp.py` | `micro_batches`
FIFO queue |
| `deepspeed/runtime/superoffload/superoffload_stage3.py` |
`params_in_ipg_bucket_buffer` drain loop |
| `deepspeed/compile/backend.py` | `remaining_schedule` step-by-step
consumption |
Signed-off-by: g97iulio1609 <giulio97.leone@gmail.com>