drill
f8bbb759 - DRILL-5941: Skip header / footer improvements for Hive storage plugin

Commit View On GitHub

Commit

6 years ago

DRILL-5941: Skip header / footer improvements for Hive storage plugin Overview: 1. When table has header / footer process input splits fo the same file in one reader (bug fix for DRILL-5941). 2. Apply skip header logic during reader initialization only once to avoid checks during reading the data (DRILL-5106). 3. Apply skip footer logic only when footer is more then 0, otherwise default processing will be done without buffering data in queue (DRILL-5106). Code changes: 1. AbstractReadersInitializer was introduced to factor out common logic during readers intialization. It will have two implementations: a. Default (each input split group gets its own reader); b. Empty (for empty tables); 2. AbstractRecordsInspector was introduced to improve performance when table has footer is less or equals to 0. It will have two implementations: a. Default (records will be processed one by one without buffering); b. SkipFooter (queue will be used to buffer N records that should be skipped in the end of file processing). 3. When text table has header / footer each table file should be read as one unit. When file is being read as several input splits, they should be grouped. For this purpose LogicalInputSplit class was introduced which replaced InputSplitWrapper class. New class stores list of grouped input splits and returns information about splits on group level. Please note, during planning input splits are grouped only when data is being read from text table has header / footer each table, otherwise each input split is treated separately. 4. Allow HiveAbstractReader to have multiple input splits instead of one. This closes #1030

Author

arina-ielchiieva

Committer

parthchandra

Parents

36abdd79

drill f8bbb759 - DRILL-5941: Skip header / footer improvements for Hive storage plugin

Commit

drill
f8bbb759 - DRILL-5941: Skip header / footer improvements for Hive storage plugin