fix #14524 (IndexError when mask prob is too low) (#14525)
* fix #14524 (IndexError when mask prob is too low)
* fix formatting
* correct documentation, add option for setting min_num_masks
* change the semantic meaning of `mask_prob` in _compute_mask_indices
With this commit the meaing of `mask_prob` actually adhered to the probability for each
vector to be the start of a masked span of length.
* fix check_copies test
* fix documentation to semantic meaning of `upper bound of overall masking percentage`, revert changes to _compute_mask_indices
* fix typo