drill
3599dfd2 - DRILL-6951: Merge row set based mock data source

Commit View On GitHub

Commit

5 years ago

DRILL-6951: Merge row set based mock data source The mock data source is used in several tests to generate a large volume of sample data, such as when testing spilling. The mock data source also lets us try new plugin featues in a very simple context. During the development of the row set framework, the mock data source was converted to use the new framework to verify functionality. This commit upgrades the mock data source with that work. The work changes non of the functionality. It does, however, improve memory usage. Batchs are limited, by default, to 10 MB in size. The row set framework minimizes internal fragmentation in the largest vector. (Previously, internal fragmentation averaged 25% but could be as high as 50%.) As it turns out, the hash aggregate tests depended on the internal fragmentation: without it, the hash agg no longer spilled for the same row count. Adjusted the generated row counts to recreate a data volume that caused spilling. One test in particular always failed due to assertions in the hash agg code. These seem true bugs and are described in DRILL-7301. After multiple failed attempts to get the test to work, it ws disabled until DRILL-7301 is fixed. Added a new unit test to sanity check the mock data source. (No test already existed for this functionality except as verified via other unit tests.)

References

#1809 - DRILL-6951: Row set based mock data source

Author

paul-rogers

Committer

arina-ielchiieva

Parents

2224ee10

drill 3599dfd2 - DRILL-6951: Merge row set based mock data source

Commit

drill
3599dfd2 - DRILL-6951: Merge row set based mock data source