-
Notifications
You must be signed in to change notification settings - Fork 1.1k
With clause enhance #16353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
With clause enhance #16353
Conversation
8f4461d to
863c4c0
Compare
caa8971 to
002f19a
Compare
3e6aab7 to
1c1e3c7
Compare
1d2fbc6 to
0ce2a5d
Compare
896c61b to
4f1233b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the WITH clause by adding support for CTE (Common Table Expression) materialization. The main objective is to optimize query performance by allowing CTEs to be materialized (executed once and cached) rather than being re-evaluated each time they're referenced. The implementation introduces a MATERIALIZED keyword, adds infrastructure for storing and reading CTE results, and integrates materialization into the query planning and execution pipeline.
Key changes include:
- Added
MATERIALIZEDkeyword support in SQL grammar for CTE declarations - Implemented CTE data storage infrastructure with memory buffering and configurable limits
- Created CteScanNode and CteScanOperator for reading materialized CTE results
- Enhanced query analysis and planning to handle CTE materialization
- Added support for EXPLAIN and EXPLAIN ANALYZE with CTE queries
Reviewed changes
Copilot reviewed 68 out of 68 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| RelationalSql.g4 | Added MATERIALIZED? keyword to namedQuery grammar rule |
| IoTDBConstant.java | Added constants for CTE query labels and folder naming |
| iotdb-system.properties.template | Added configuration properties for CTE buffer size and max rows |
| IoTDBConfig.java | Added CTE buffer configuration fields and getters/setters |
| IoTDBDescriptor.java | Added configuration loading logic for CTE buffer settings |
| WithQuery.java | Added materialized flag field to track CTE materialization intent |
| Query.java | Added materialized and done flags to track materialization state |
| ComparisonExpression.java | Made left/right fields mutable with setters for predicate reconstruction |
| LongLiteral.java | Changed parse method visibility to public for external access |
| CteDataStore.java | New class implementing CTE data storage with memory buffer and reference counting |
| CteDataReader.java | New interface defining contract for reading CTE data |
| MemoryReader.java | New class implementing in-memory CTE data reading |
| CteScanNode.java | New plan node representing a scan over materialized CTE data |
| CteScanOperator.java | New operator for executing CTE scans at runtime |
| CteMaterializer.java | Core logic for executing CTE queries and materializing results |
| PredicateWithUncorrelatedScalarSubqueryReconstructor.java | New class to optimize predicates by pre-executing uncorrelated scalar subqueries |
| Analysis.java | Added With clause tracking and named query map getter |
| StatementAnalyzer.java | Enhanced to set explain types and track CTE tables across scopes |
| Analyzer.java | Added CTE datastore registration from parent queries |
| Scope.java | Added table tracking to propagate CTE references through query scopes |
| ExpressionAnalyzer.java | Added subquery table tracking for CTE dependency resolution |
| RelationPlanner.java | Added materialized CTE processing logic with CteScanNode generation |
| SubqueryPlanner.java | Integrated uncorrelated scalar subquery folding for predicate pushdown |
| TableLogicalPlanner.java | Added CTE materialization invocation before query planning |
| Various optimizer files | Added CteScanNode handling in optimization rules |
| TableDistributedPlanner.java | Added explain result handling for CTE subqueries |
| AddExchangeNodes.java | Added CteScanNode distribution logic |
| TableOperatorGenerator.java | Added CteScanOperator generation |
| MPPQueryContext.java | Added CTE tracking fields and methods for materialization metadata |
| ExplainType.java | New enum to distinguish NONE, EXPLAIN, and EXPLAIN_ANALYZE modes |
| QueryExecution.java | Added CTE cleanup in resource release |
| ExplainAnalyzeOperator.java | Enhanced to merge CTE and main query explain analyze results |
| FragmentInstanceExecution.java | Updated statistics filtering to handle CTE virtual regions |
| FragmentInstanceStatisticsDrawer.java | Fixed fragment instance counting to exclude null data regions |
| DistributedQueryPlan.java | Added planText field for storing explain results |
| PlanVisitor.java | Added visitCteScan method |
| PlanGraphPrinter.java | Added CteScan and CorrelatedJoin rendering |
| TableModelStatementMemorySourceVisitor.java | Added CTE explain result merging logic |
| ConvertPredicateToTimeFilterVisitor.java | Enhanced to handle DoubleLiteral in time filters |
| Test files | Comprehensive test coverage including unit tests, integration tests, and matcher classes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Pattern pattern = Pattern.compile(regex); | ||
| Matcher matcher = pattern.matcher(line); | ||
| if (matcher.find()) { | ||
| return Long.parseLong(matcher.group(1)); |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential uncaught 'java.lang.NumberFormatException'.
| return Long.parseLong(matcher.group(1)); | |
| try { | |
| return Long.parseLong(matcher.group(1)); | |
| } catch (NumberFormatException e) { | |
| return 0; | |
| } |
| } | ||
|
|
||
| public Map<Query, List<Identifier>> getSubQueryTables() { | ||
| return subQueryTables; |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getSubQueryTables exposes the internal representation stored in field subQueryTables. The value may be modified after this call to getSubQueryTables.
getSubQueryTables exposes the internal representation stored in field subQueryTables. The value may be modified after this call to getSubQueryTables.
| return subQueryTables; | |
| return Collections.unmodifiableMap(subQueryTables); |
| } | ||
|
|
||
| public Map<NodeRef<Table>, CteDataStore> getCteDataStores() { | ||
| return cteDataStores; |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getCteDataStores exposes the internal representation stored in field cteDataStores. The value may be modified after this call to getCteDataStores.
| return cteDataStores; | |
| return Collections.unmodifiableMap(cteDataStores); |
| return builder().like(this).build(); | ||
| } | ||
|
|
||
| public List<Identifier> getTables() { |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getTables exposes the internal representation stored in field tables. The value may be modified after this call to getTables.
getTables exposes the internal representation stored in field tables. The value may be modified after this call to getTables.
208f2e9 to
2841855
Compare
7ac38e5 to
8c558c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 67 out of 67 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // The buffer for cte materialization. | ||
| long cteBufferSizeInBytes = | ||
| Long.parseLong( | ||
| properties.getProperty( | ||
| "cte_buffer_size_in_bytes", Long.toString(conf.getCteBufferSize()))); | ||
| if (cteBufferSizeInBytes > 0) { | ||
| conf.setCteBufferSize(cteBufferSizeInBytes); | ||
| } | ||
|
|
||
| // max number of rows for cte materialization | ||
| int maxRowsInCteBuffer = | ||
| Integer.parseInt( | ||
| properties.getProperty( | ||
| "max_rows_in_cte_buffer", Integer.toString(conf.getMaxRowsInCteBuffer()))); | ||
| if (maxRowsInCteBuffer > 0) { | ||
| conf.setMaxRowsInCteBuffer(maxRowsInCteBuffer); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add these in loadHotModifiedProps function, these two configurations should be hot loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| # will be used. | ||
| # effectiveMode: hot_reload | ||
| # Datatype: long | ||
| cte_buffer_size_in_bytes=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cte_buffer_size_in_bytes=0 | |
| # Privilege: SYSTEM | |
| # unit: byte | |
| cte_buffer_size_in_bytes=131072 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| # Max rows for CTE materialization | ||
| # effectiveMode: hot_reload | ||
| # Datatype: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Datatype: int | |
| # Datatype: int | |
| # Privilege: SYSTEM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| if (drivers == null || drivers.isEmpty()) { | ||
| return false; | ||
| } | ||
| // Check if any driver contains ExplainAnalyzeOperator | ||
| return drivers.stream() | ||
| .anyMatch( | ||
| driver -> | ||
| driver.getDriverContext().getOperatorContexts().stream() | ||
| .anyMatch( | ||
| operatorContext -> | ||
| ExplainAnalyzeOperator.class | ||
| .getSimpleName() | ||
| .equals(operatorContext.getOperatorType()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be inited as a field while generating this FragmentInstanceExecution in LocalExecutionPlanner instead of doing this judgement each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| dataReader.close(); | ||
| } | ||
| } catch (Exception e) { | ||
| LOGGER.error("Fail to close fileChannel", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LOGGER.error("Fail to close fileChannel", e); | |
| LOGGER.error("Fail to close CteDataReader", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| (tableRef, query) -> { | ||
| Table table = tableRef.getNode(); | ||
| if (query.isMaterialized()) { | ||
| if (!query.isDone()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may need another if to check whether this cte has already failed last time? if so we can just return even if this cte is not done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought even if it failed last time, maybe it's worthy try another time in case it's network problem.
| PredicateWithUncorrelatedScalarSubqueryReconstructor.getInstance() | ||
| .clearShadowExpression(predicate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't clear shadow expression here, the predicate expression is still being used afterwards. We can only clear them while front end finished.
|
|
||
| import org.apache.tsfile.read.common.block.TsBlock; | ||
|
|
||
| public interface CteDataReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public interface CteDataReader { | |
| public interface CteDataReader extends Accountable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| * | ||
| * @return the bytes used by this CteDataReader | ||
| */ | ||
| long bytesUsed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| long bytesUsed(); | |
| long ramBytesUsed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| @Override | ||
| public long bytesUsed() { | ||
| return INSTANCE_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle the memory footprint right.
# Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/common/MPPQueryContext.java # Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/node/PlanNodeType.java # Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/node/PlanNodeType.java
# Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/common/MPPQueryContext.java
…ze predicate pushdown
# Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/Coordinator.java # Conflicts: # iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/Coordinator.java
4eb6aac to
fe47b8f
Compare


Description
Content1 ...
Content2 ...
Content3 ...
This PR has:
for an unfamiliar reader.
for code coverage.
Key changed/added classes (or packages if there are too many classes) in this PR