Detangle linear indexing from non-scalar indexing
This removes the LinearFast special cases from non-scalar indexing. Previously, we were manually hoisting the div/rem sub2ind calculation along the indexed strides, but LLVM seems to be just as capable at performing this optimization in the cases I have tested. Even better, though, this creates a clean separation between the array indexing fallbacks:
* Scalar fallbacks use `ind2sub` and `sub2ind` to compute the required number of indices that the custom type must implement.
* Non-scalar fallbacks simply "unwrap" the elements from AbstractArrays and use scalar indexing with the indices that were provided.
* (CartesianIndices are also expanded to individual integers, but that is a smaller detail.)
In all cases that I've tried, I've been unable to measure a performance difference. Indeed, the LLVM IR looks identical in my spot-checks, too.