fix: use subprocess instead of os.system in data_analyzer.py (#7994)
## Summary
Fix critical severity security issue in
`deepspeed/runtime/data_pipeline/data_sampling/data_analyzer.py`.
## Vulnerability
| Field | Value |
|-------|-------|
| **ID** | V-001 |
| **Severity** | CRITICAL |
| **Scanner** | multi_agent_ai |
| **Rule** | `V-001` |
| **File** |
`deepspeed/runtime/data_pipeline/data_sampling/data_analyzer.py:75` |
**Description**: The data_analyzer.py file uses os.system() with an
f-string that directly interpolates the variable metric_to_sample_fname
into a shell command without any sanitization. This variable is derived
from user-supplied dataset configuration or file paths. Because
os.system() invokes a shell interpreter, any shell metacharacters in the
variable (semicolons, backticks, dollar signs, pipes, ampersands) will
be interpreted and executed as separate shell commands.
## Changes
- `deepspeed/runtime/data_pipeline/data_sampling/data_analyzer.py`
## Verification
- [x] Build passes
- [x] Scanner re-scan confirms fix
- [x] LLM code review passed
---
*Automated security fix by [OrbisAI Security](https://orbisappsec.com)*
Signed-off-by: orbisai0security <mediratta01.pally@gmail.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>