onnxruntime
94a6020a - Improve parallelization of TfIdfVectorizer, Reduce memory consumption (#18539)

Commit

2 years ago

Improve parallelization of TfIdfVectorizer, Reduce memory consumption (#18539) ### Description TfIdfVectorizer has two steps: first search for n-grams in the input, second, weight the results. The second step was not parallelized. The PR adresses that issue. Before two vectors were of the size of the output were allocated to compute the results. The first one, frequencies, was used as an intermediate vector between the two steps. This vector is now broken into multiple small vectors, one per thread. The memory consumption is then reduced for batches with a number of rows > the number of threads. ### Motivation and Context Performance and memory consumption. For one model, the improvment is +15% faster (4 cores, model size is ~6Mb, batch size is 100). Here is another benchmark on a machine with 32 cores with different size of vocabularies and batch sizes. The tested TfIdfVectorizer only deals with unigram and processes sequences of 10 tokens (integers). ![image](https://github.com/microsoft/onnxruntime/assets/22452781/0bb9abe9-ed81-44da-b5c4-ad2a12f129bd)

References

#18539 - Improve parallelization of TfIdfVectorizer, Reduce memory consumption

Author

xadupre

Parents

3f42fbad

onnxruntime 94a6020a - Improve parallelization of TfIdfVectorizer, Reduce memory consumption (#18539)

onnxruntime
94a6020a - Improve parallelization of TfIdfVectorizer, Reduce memory consumption (#18539)