llvm-project
83f8eee5 - [ELF] Parallelize input file loading (#191690)

Commit
15 days ago
[ELF] Parallelize input file loading (#191690) During `createFiles`, `addFile()` records a `LoadJob` for each non-script input (archive, relocatable, DSO, bitcode, binary) with a state-machine snapshot (`inWholeArchive`, `inLib`, `asNeeded`, `withLOption`, `groupId`) and expands them on worker threads in `loadFiles()`. Linker scripts are still processed inline since their `INPUT()` and `GROUP()` commands recursively call `addFile()`. Outside `createFiles()`, `loadFiles()` is called with a single job and drained immediately (`deferLoad` is false). Two cases: - `addDependentLibrary()`: `.deplibs` sections trigger `addFile()` during the serial `doParseFiles()` loop. - `--just-symbols`: pushes files directly, bypassing `addFile`/`LoadJob`. Thread-safety: - A mutex serializes `BitcodeFile` / fatLTO constructors that call `ctx.saver` / `ctx.uniqueSaver`. Zero contention on pure ELF links. - Thin-archive member buffers accumulate in per-job `SmallVector`s and are merged into `ctx.memoryBuffers` in command-line order. - `groupId` is pre-claimed during the serial walk and written to each produced file after construction (the `InputFile` constructor no longer reads `nextGroupId`). Performance (--threads=8): ``` clang-relassert (267 thin archives, 10 .o, 2 .so): 965 +/- 32 ms -> 924 +/- 24 ms (1.05x, 80 runs) (Apple M4) 249.7ms ± 2.5ms -> 221.2ms ± 1.4ms (1.13x, 10 runs) chromium (532 .a, 3314 .o, 343 .so): 8.071 +/- 0.472 s -> 7.370 +/- 0.198 s (1.10x, 20 runs) ``` Parallelizing all file kinds (not just archives) matters for .o-dominated workloads like chromium where archive-only parallelization shows no improvement. Output is byte-identical to the old lld and deterministic across `--threads` values.
Author
Parents
Loading