spark
220f29a6 - [SPARK-28081][ML] Handle large vocab counts in word2vec

Commit

6 years ago

[SPARK-28081][ML] Handle large vocab counts in word2vec ## What changes were proposed in this pull request? The word2vec logic fails if a corpora has a word with count > 1e9. We should be able to handle very large counts generally better here by using longs to count. This takes over https://github.com/apache/spark/pull/24814 ## How was this patch tested? Existing tests. Closes #24893 from srowen/SPARK-28081. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com> (cherry picked from commit e96dd82f12f2b6d93860e23f4f98a86c3faf57c5) Signed-off-by: Sean Owen <sean.owen@databricks.com>

Author

srowen

Committer

srowen

Parents

e6b5a5cf

spark 220f29a6 - [SPARK-28081][ML] Handle large vocab counts in word2vec

spark
220f29a6 - [SPARK-28081][ML] Handle large vocab counts in word2vec