speed-up `randperm` by using our current `rand(1:n)` (#50509)
And similarly for `randcycle` and `shuffle`.
We had a custom version of range generation for `randperm`, which was
based on the ideas of our previous default range sampler
`SamplerRangeFast` (generate `k`-bits integers using masking and reject
out-of-range ones) and took advantage of the fact that `randperm` needs
to generate `rand(1:i)` for `i = 2:n`.
But our current range sampler ("Nearly Division Less") is usually better
than this hack, and makes these functions more readable. Typically, for
array lengths `< 2^20`, the new version is faster, but gets slightly
slower beyond 2^22.
Here are some speedups:

The slow down for big arrays seems fine to me, but I will see if I can
find an easy workaround.
Fix #57771.