[SPARK-28081][ML] Handle large vocab counts in word2vec
## What changes were proposed in this pull request?
The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count.
This takes over https://github.com/apache/spark/pull/24814
## How was this patch tested?
Existing tests.
Closes #24893 from srowen/SPARK-28081.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit e96dd82f12f2b6d93860e23f4f98a86c3faf57c5)
Signed-off-by: Sean Owen <sean.owen@databricks.com>