Make operator TreeEnsemble 5x faster for batches of size 100.000 (#5965)
* improves processing time by 10
* extend coverage unit test coverage
* better implementation for the multi regression case
* better comment, keep parallelization by trees when not enough trees