DRILL-5323: Test tools for row sets
Provide test tools to create, populate and compare row sets
To simplify tests, we need a TestRowSet concept that wraps a
VectorContainer and provides easy ways to:
- Define a schema for the row set.
- Create a set of vectors that implement the schema.
- Populate the row set with test data via code.
- Add an SV2 to the row set.
- Pass the row set to operator components (such as generated code
blocks.)
- Examine the contents of a row set
- Compare the results of the operation with an expected result set.
- Dispose of the underling direct memory when work is done.
This code builds on that in DRILL-5324 to provide a complete row set
API. See DRILL-5318 for the spec.
Note: this code can be reviewed as-is, but cannot be committed until
after DRILL-5324 is committed: this code has compile-time dependencies
on that code. This PR will be rebased once DRILL-5324 is pulled into
master.
Handles maps and intervals
The row set schema is refined to provide two forms of schema. A
physical schema shows the nested structure of the data with maps
expanding into their contents.
Updates the row set schema builder to easily build a schema with maps.
An access schema shows the row “flattened” to include just scalar
(non-map) columns, with all columns at a single level, with dotted
names identifying nested fields. This form makes for very simple access.
Then, provides tools for reading and writing batches with maps by
presenting the flattened view to the row reader and writer.
HyperVectors have a very complex structure for maps. The hyper row set
implementation takes a first crack at mapping that structure into the
standardized row set format.
Also provides a handy way to set an INTERVAL column from an int. There
is no good mapping from an int to an interval, so an arbitrary
convention is used. This convention is not generally useful, but is
very handy for quickly generating test data.
As before, this is a partial PR. The code here still depends on
DRILL-5324 to provide the column accessors needed by the row reader and
writer.
All this code is getting rather complex, so this commit includes a unit
test of the schema and row set code.
Revisions to support arrays
Arrays require a somewhat different API. Refactored to allow arrays to
appear as a field type.
While refactoring, moved interfaces to more logical locations.
Added more comments.
Rejiggered the row set schema to provide both a physical and flattened
(access) schema, both driven from the original batch schema.
Pushed some accessor and writer classes into the accessor layer.
Added tests for arrays.
Also added more comments where needed.
Moved tests to DRILL-5318
The test classes previously here depend on the new “operator fixture”.
To provide a non-cyclic checkin order, moved the tests to the PR with
the fixtures so that this PR is clear of dependencies. The tests were
reviewed in the context of DRILL-5318.
Also pulls in batch sizer support for map fields which are required by
the tests.
closes #785