Make `findall` faster for AbstractArrays (#37177)
The `findall` fallback is quite slow when predicate is a small function
compared with generating a logical index using `broadcast` and calling `findall`
on it to compute integer indices. The gain is most visible when predicate is true
for a large proportion of entries, but it's there even when all of them are false.
The drawback of this approach is that it requires allocating a vector of `length(a)/8`
bytes whatever the number of returned indices.