faster `rand(TaskLocalRNG(), 1:n)` by outlining `throw` (#58306)
In #58089, this method took a small performance hit in some contexts. It
turns out that by outlining the unlikely branch which throws on empty
ranges, this hit can be recovered.
In
https://github.com/JuliaLang/julia/pull/50509#issuecomment-2798850590, a
graph of the performance improvement of the "speed-up randperm by using
our current rand(1:n)" was posted, but I realized it was only true when
calls to `rand(1:n)` were prefixed by `@inline`; without `@inline` it
was overall slower for `TaskLocalRNG()` for very big arrays (but still
faster otherwise).
An alternative to these `@inline` annotation is to outline `throw` like
here, for equivalent benefits as `@inline` in that `randperm` PR.
Assuming that PR is merged, this PR improves roughly performance by 2x
for `TaskLocalRNG()` (no change for other RNGs):

While at it, I outlined a bunch of other unliky throwing branches.
After that, #50509 can probably be merged, finally!