onnxruntime
8b09702b - Enable parallel computation in Clip ops (#14925)

Commit

2 years ago

Enable parallel computation in Clip ops (#14925) ### Description  This PR speeds-up Clip operations by replacing their sequential implementation with a parallelized one. The parallelization is achieved by dividing the input data into chunks of size N and using a thread pool to process the chunks in parallel. The chunk size N is set to 16K based on performance evaluation on input tensors of 10^i elements for i in [1 .. 6]. ### Motivation and Context  The Clip operation is frequently executed in image processing models. Its implementation can be easily parallelized and therefore sped up when executed on a multi-core machine. On long inputs (>= 100K elements) this PR achieves speedup of over 2x. On shorter inputs, this PR does not introduce any substantial performance change.

References

#14925 - Enable parallel computation in Clip ops

Author

sakogan

Parents

2ff7f3e9

onnxruntime 8b09702b - Enable parallel computation in Clip ops (#14925)

onnxruntime
8b09702b - Enable parallel computation in Clip ops (#14925)