drill
[DRILL-7191 / DRILL-7026]: RM state blob persistence in Zookeeper and Integration of Distributed queue configuration with Planner
#1762
Open

[DRILL-7191 / DRILL-7026]: RM state blob persistence in Zookeeper and Integration of Distributed queue configuration with Planner #1762

HanumathRao wants to merge 7 commits into apache:master from sohami:DRILL-7026-Final-PR
HanumathRao
HanumathRao6 years ago (edited 6 years ago)

This PR contains changes for the support of RM Framework both on execution and planning side, tracked by JIRA's DRILL-7191 and DRILL-7026.

  1. Refactoring existing ZK based queue to accommodate new Distributed queue for RM. Moved QueryResourceAllocators memory allocation code to utility classes like ZKQueueMemoryAllocationUtilities and DefaultMemoryAllocationUtilities. Refactored the Parallelizer code to accommodate the memory adjustment for the operators during parallelization phase. There are 3 different implementation of SimpleParallelizer such as ZKQueueParallelizer, DistributedQueueParallelizer and DefaultParallelizer which will be used by ZK based RM, Distributed RM and Non RM configuration.

  2. Planner integration with RM to select queue and reduce query level memory to be within queue limits. Changes to handle scenarios where buffered operator are at least getting minimum required memory allocation. Based on the calculated memory for each operator within each fragment it’s initial and maximum memory allocation is set which is later consumed by execution layer to enforce memory limits.

  3. Introduced new DrillNode class to deal with issues when DrillbitEndpoint is searched in a map using some of it’s field.

  4. Changes to support storing UUID for each Drillbit Service Instance locally to be used by planner and execution layer. This UUID is used to uniquely identify a Drillbit and register Drillbit information in the RM StateBlobs. Introduced a PersistentStore named ZookeeperTransactionalPersistenceStore with Transactional capabilities using Zookeeper Transactional API’s. This is used for updating RM State blobs as all the updates need to happen in transactional manner. Added RMStateBlobs definition and support for serde to Zookeeper. Implementation for DistributedRM and its corresponding QueryRM apis.

  5. Updated the state management of Query in Foreman so that same Foreman object can be submitted multiple times. Also introduced concept of 2 maps keeping track of waiting and running queries. These were done to support for async admit protocol which will be needed with Distributed RM.

  6. Support for serde of optimalMemoryAllocation for each operator in each minor fragment in QueryProfile. This is needed to verify the optimalMemory calculated by planner is correct.

sohami DRILL-7164: KafkaFilterPushdownTest is sometimes failing to pattern m…
097122eb
sohami sohami force pushed from 55e5e15d to 130252a8 6 years ago
HanumathRao DRILL-7193: Integration changes of the Distributed RM queue configura…
7e71cd03
sohami DRILL-7191: RM blobs persistence in Zookeeper for Distributed RM.
1db65a27
HanumathRao DRILL-7193: Integration changes of the Distributed RM queue configura…
2b6a91a2
HanumathRao HanumathRao force pushed from 130252a8 to 02402ed6 6 years ago
sohami sohami force pushed from 02402ed6 to 1d8b4210 6 years ago
sohami sohami force pushed from 3135402f to de9e5f7f 6 years ago
sohami sohami force pushed from de9e5f7f to e9b4fa5a 6 years ago
sohami DRILL-7191: RM blobs persistence in Zookeeper for Distributed RM.
2800c579
HanumathRao DRILL-7193: Integration changes of the Distributed RM queue configura…
1517a87e
sohami sohami force pushed from e9b4fa5a to 1517a87e 6 years ago
sohami sohami changed the title DRILL-7191 Distributed state persistence and Integration of Distributed queue configuration with Planner [DRILL-7191 / DRILL-7026]: RM state blob persistence in Zookeeper and Integration of Distributed queue configuration with Planner 6 years ago
sohami
sohami commented on 2019-05-01
exec/java-exec/src/main/java/org/apache/drill/exec/ops/OpProfileDef.java
25 public long optimalMemoryAllocation;
2526
26 public OpProfileDef(int operatorId, int operatorType, int incomingCount) {
27
public OpProfileDef(int operatorId, int operatorType, int incomingCount, long optimalMemoryAllocation) {
sohami6 years ago

Will all the creator of OpProfileDef always pass MaxAllocation for optimalMemoryAllocation ?

exec/java-exec/src/main/java/org/apache/drill/exec/ops/OperatorStats.java
8989
9090 @VisibleForTesting
91 public OperatorStats(int operatorId, int operatorType, int inputCount, BufferAllocator allocator) {
91
public OperatorStats(int operatorId, int operatorType, int inputCount, BufferAllocator allocator, long initialAllocation) {
sohami6 years ago

suggest to rename initialAllocation to optimalMemoryAllocation

exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/MemoryCalculator.java
49 private final Map<DrillbitEndpoint, List<Pair<PhysicalOperator, Long>>> bufferedOperators;
50 private final Map<DrillNode, List<Pair<PhysicalOperator, Long>>> bufferedOperators;
5051 private final QueryContext queryContext;
52
private final long MINIMUM_MEMORY_FOR_BUFFER_OPERS;
sohami6 years ago

upper case variable name should be used only for constants, please change it to lower case.

exec/java-exec/src/main/java/org/apache/drill/common/DrillNode.java
45 endpoint.getUserPort() == otherEndpoint.getUserPort() &&
46 endpoint.getControlPort() == otherEndpoint.getControlPort() &&
47 endpoint.getDataPort() == otherEndpoint.getDataPort() &&
48
endpoint.getVersion().equals(otherEndpoint.getVersion());
sohami6 years ago

looks like all the fields in DrillbitEndpoint are optional, so we should check if the field is present or not before calling equals on it. Just like done for hashCode() below or refer equals in generated file for DrillbitEndpoint.

exec/java-exec/src/main/java/org/apache/drill/common/DrillNode.java
84 .append(endpoint.getAddress())
85 .append("endpoint user port: ")
86 .append(endpoint.getUserPort()).toString();
87
}
sohami6 years ago

check if field is present or not before accessing it.

exec/java-exec/src/main/java/org/apache/drill/exec/util/memory/ZKQueueMemoryAllocationUtilities.java
252
253 maxAllocPerNode = Math.min(maxAllocPerNode, perQueryMemory);
254 return maxAllocPerNode;
255
}
sohami6 years ago

Lot of code here and in DefaultMemoryAllocationUtilities are duplicate. May be create a separate MemoryAllocationUtilities to keep the common code. Also the same BufferedOpFinder is defined in Fragment class as well, would be good to define at one place only and reuse it wherever needed.

exec/java-exec/src/main/java/org/apache/drill/exec/resourcemgr/NodeResources.java
3144
32 private final long memoryInBytes;
45 private long memoryInBytes;
46
47
private long numVirtualCpu;
sohami6 years ago

Please move numVirtualCpu back to int type.

exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/MemoryCalculator.java
5659 }
5760
5861 // Helper method to compute the minor fragment count per drillbit. This method returns
5962
// a map with key as DrillbitEndpoint and value as the width (i.e #minorFragments)
sohami6 years ago

key as DrillNode

exec/java-exec/src/main/java/org/apache/drill/exec/work/user/PlanSplitter.java
133 queryId, queryContext.getOnlineEndpointUUIDs(), rootFragment,
138134 queryContext.getSession(), queryContext.getQueryContextInfo());
139 planner.visitPhysicalPlan(queryWorkUnit);
135
// planner.visitPhysicalPlan(queryWorkUnit);
sohami6 years ago

remove this commented line

protocol/src/main/protobuf/BitControl.proto
7979 optional string options_json = 15;
8080 optional QueryContextInformation context = 16;
8181 repeated Collector collector = 17;
82
optional string endpointUUID = 18;
sohami6 years ago

suggest to rename this to assignedEndpointUUID since there are 2 endpoints as part of PlanFragment: assignedEndpoint and foremanEndpoint

HanumathRao HanumathRao force pushed from 38e9a737 to 51cf5a49 6 years ago
HanumathRao Addressing Review comments.
a3f8b366
HanumathRao HanumathRao force pushed from 51cf5a49 to a3f8b366 6 years ago
HanumathRao
HanumathRao commented on 2019-05-13
HanumathRao
HanumathRao commented on 2019-05-13
HanumathRao
HanumathRao commented on 2019-05-13
HanumathRao
HanumathRao commented on 2019-05-13
HanumathRao
HanumathRao commented on 2019-05-13
HanumathRao
HanumathRao commented on 2019-05-13
HanumathRao
HanumathRao commented on 2019-05-13

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone