[ELF] Separate relative and non-relative dynamic relocations (#187959)
Previously, the flow was:
1. Parallel scan adds relative relocs to per-thread `relocsVec`
2. `mergeRels()` copies all into `relocs`
3. `partitionRels()` uses `stable_partition` to separate
Now, relative relocs are routed at `addReloc` time by checking
`reloc.type == relativeRel`. In `mergeRels`, sharded entries are
classified through the same `addReloc` path rather than blindly
appended. `relocsVec` may contain non-relative entries like
`R_AARCH64_AUTH_RELATIVE`.
This eliminates the `stable_partition` on the full relocation vector
(543K entries for clang) and avoids copying relative relocations into
`relocs` only to move them out again.
Linking an x86_64 release+assertions build of clang is 1.04x as fast.
`numRelativeRelocs` caches `relativeRelocs.size()` at `finalizeContents`
time for `DT_RELACOUNT`. Using a live `relativeRelocs.size()` would
cause `DynamicSection::writeTo` to emit an extra entry when thunks add
relocs after `.dynamic` is sized, overflowing into adjacent sections.
Tested by ppc64-long-branch-rel14.s.