drill
8d725f69 - DRILL-335: Implement Hash Aggregation

Commit View On GitHub

Commit

10 years ago

DRILL-335: Implement Hash Aggregation 1. Implementation of the hash aggregation execution operator - this has two main parts: the HashAggTemplate and the HashAggBatch. 2. Implementation of a hash table which is used by the hash aggregation. The hash table hash two main parts: the HashTableTemplate and the ChainedHashTable. The hash table internally uses the notion of 'BatchHolder' to keep track of all keys that can fit within one batch of 64K values. New BatchHolder objects are created as needed. Each BatchHolder has its own vector container. The HashAggregate also has a similar structure and it keeps track of the workspace variables. (NOTE: An initial design document for the hash aggregation and hash table was already attached with Drill-335. The document has not yet been updated with the latest implementation ... will try to do that in the near future). 3. Jinfeng's changes to use workspace vectors in the generated code for aggregate functions (previously, for streaming aggregate we only needed to maintain workspace variable for 1 running group; however for hash aggregate we need to maintain it for all groups). 4. Fix for Drill-318: because of #3 above, the previous fix for Drill-318 is not valid anymore. I modified the template generation code for the aggregate functions such that they conform to the new infrastructure. 5. The original AggTemplate, AggBatch and Aggregator classes have been moved to corresponding StreamingAggTemplate, StreamingAggBatch and StreamingAggregator in order to differentiate it from hash aggregation. These appear as new files but the code there has not changed. I have run several tests manually as part of TestHashAggr...these tests use TPC-H data and in particular a relatively large 'Orders' table. However, I have not yet packaged the tests to run as part of JUnit since the location and size of the parquet files needs to be figured out. I will continue to work on that.

Author

Aman Sinha

Committer

jacques-n

Parents

693bd353

drill 8d725f69 - DRILL-335: Implement Hash Aggregation

Commit

drill
8d725f69 - DRILL-335: Implement Hash Aggregation