Skip to content

Conversation

@shizy818
Copy link
Contributor

@shizy818 shizy818 commented Sep 5, 2025

Description

Content1 ...

Content2 ...

Content3 ...


This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

@shizy818 shizy818 force-pushed the with_clause_enhance branch 2 times, most recently from caa8971 to 002f19a Compare September 29, 2025 06:07
@shizy818 shizy818 changed the title With clause enhance (Review Only Now!) With clause enhance Sep 30, 2025
@shizy818 shizy818 force-pushed the with_clause_enhance branch from 1d2fbc6 to 0ce2a5d Compare October 21, 2025 06:23
@shizy818 shizy818 force-pushed the with_clause_enhance branch 2 times, most recently from 896c61b to 4f1233b Compare December 2, 2025 01:25
@JackieTien97 JackieTien97 requested a review from Copilot December 8, 2025 01:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the WITH clause by adding support for CTE (Common Table Expression) materialization. The main objective is to optimize query performance by allowing CTEs to be materialized (executed once and cached) rather than being re-evaluated each time they're referenced. The implementation introduces a MATERIALIZED keyword, adds infrastructure for storing and reading CTE results, and integrates materialization into the query planning and execution pipeline.

Key changes include:

  • Added MATERIALIZED keyword support in SQL grammar for CTE declarations
  • Implemented CTE data storage infrastructure with memory buffering and configurable limits
  • Created CteScanNode and CteScanOperator for reading materialized CTE results
  • Enhanced query analysis and planning to handle CTE materialization
  • Added support for EXPLAIN and EXPLAIN ANALYZE with CTE queries

Reviewed changes

Copilot reviewed 68 out of 68 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
RelationalSql.g4 Added MATERIALIZED? keyword to namedQuery grammar rule
IoTDBConstant.java Added constants for CTE query labels and folder naming
iotdb-system.properties.template Added configuration properties for CTE buffer size and max rows
IoTDBConfig.java Added CTE buffer configuration fields and getters/setters
IoTDBDescriptor.java Added configuration loading logic for CTE buffer settings
WithQuery.java Added materialized flag field to track CTE materialization intent
Query.java Added materialized and done flags to track materialization state
ComparisonExpression.java Made left/right fields mutable with setters for predicate reconstruction
LongLiteral.java Changed parse method visibility to public for external access
CteDataStore.java New class implementing CTE data storage with memory buffer and reference counting
CteDataReader.java New interface defining contract for reading CTE data
MemoryReader.java New class implementing in-memory CTE data reading
CteScanNode.java New plan node representing a scan over materialized CTE data
CteScanOperator.java New operator for executing CTE scans at runtime
CteMaterializer.java Core logic for executing CTE queries and materializing results
PredicateWithUncorrelatedScalarSubqueryReconstructor.java New class to optimize predicates by pre-executing uncorrelated scalar subqueries
Analysis.java Added With clause tracking and named query map getter
StatementAnalyzer.java Enhanced to set explain types and track CTE tables across scopes
Analyzer.java Added CTE datastore registration from parent queries
Scope.java Added table tracking to propagate CTE references through query scopes
ExpressionAnalyzer.java Added subquery table tracking for CTE dependency resolution
RelationPlanner.java Added materialized CTE processing logic with CteScanNode generation
SubqueryPlanner.java Integrated uncorrelated scalar subquery folding for predicate pushdown
TableLogicalPlanner.java Added CTE materialization invocation before query planning
Various optimizer files Added CteScanNode handling in optimization rules
TableDistributedPlanner.java Added explain result handling for CTE subqueries
AddExchangeNodes.java Added CteScanNode distribution logic
TableOperatorGenerator.java Added CteScanOperator generation
MPPQueryContext.java Added CTE tracking fields and methods for materialization metadata
ExplainType.java New enum to distinguish NONE, EXPLAIN, and EXPLAIN_ANALYZE modes
QueryExecution.java Added CTE cleanup in resource release
ExplainAnalyzeOperator.java Enhanced to merge CTE and main query explain analyze results
FragmentInstanceExecution.java Updated statistics filtering to handle CTE virtual regions
FragmentInstanceStatisticsDrawer.java Fixed fragment instance counting to exclude null data regions
DistributedQueryPlan.java Added planText field for storing explain results
PlanVisitor.java Added visitCteScan method
PlanGraphPrinter.java Added CteScan and CorrelatedJoin rendering
TableModelStatementMemorySourceVisitor.java Added CTE explain result merging logic
ConvertPredicateToTimeFilterVisitor.java Enhanced to handle DoubleLiteral in time filters
Test files Comprehensive test coverage including unit tests, integration tests, and matcher classes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
return Long.parseLong(matcher.group(1));
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential uncaught 'java.lang.NumberFormatException'.

Suggested change
return Long.parseLong(matcher.group(1));
try {
return Long.parseLong(matcher.group(1));
} catch (NumberFormatException e) {
return 0;
}

Copilot uses AI. Check for mistakes.
}

public Map<Query, List<Identifier>> getSubQueryTables() {
return subQueryTables;
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSubQueryTables exposes the internal representation stored in field subQueryTables. The value may be modified after this call to getSubQueryTables.
getSubQueryTables exposes the internal representation stored in field subQueryTables. The value may be modified after this call to getSubQueryTables.

Suggested change
return subQueryTables;
return Collections.unmodifiableMap(subQueryTables);

Copilot uses AI. Check for mistakes.
}

public Map<NodeRef<Table>, CteDataStore> getCteDataStores() {
return cteDataStores;
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCteDataStores exposes the internal representation stored in field cteDataStores. The value may be modified after this call to getCteDataStores.

Suggested change
return cteDataStores;
return Collections.unmodifiableMap(cteDataStores);

Copilot uses AI. Check for mistakes.
return builder().like(this).build();
}

public List<Identifier> getTables() {
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getTables exposes the internal representation stored in field tables. The value may be modified after this call to getTables.
getTables exposes the internal representation stored in field tables. The value may be modified after this call to getTables.

Copilot uses AI. Check for mistakes.
@shizy818
Copy link
Contributor Author

shizy818 commented Dec 11, 2025

Sonar Coverage Report

Master branch: test coverage is 36.0%
图片

with_clause_enhance branch: test coverage for new code is 80%
图片

@shizy818 shizy818 force-pushed the with_clause_enhance branch from 208f2e9 to 2841855 Compare December 11, 2025 07:05
@shizy818 shizy818 force-pushed the with_clause_enhance branch 2 times, most recently from 7ac38e5 to 8c558c4 Compare December 23, 2025 01:31
@JackieTien97 JackieTien97 requested a review from Copilot December 23, 2025 11:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 67 out of 67 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1060 to +1077
// The buffer for cte materialization.
long cteBufferSizeInBytes =
Long.parseLong(
properties.getProperty(
"cte_buffer_size_in_bytes", Long.toString(conf.getCteBufferSize())));
if (cteBufferSizeInBytes > 0) {
conf.setCteBufferSize(cteBufferSizeInBytes);
}

// max number of rows for cte materialization
int maxRowsInCteBuffer =
Integer.parseInt(
properties.getProperty(
"max_rows_in_cte_buffer", Integer.toString(conf.getMaxRowsInCteBuffer())));
if (maxRowsInCteBuffer > 0) {
conf.setMaxRowsInCteBuffer(maxRowsInCteBuffer);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add these in loadHotModifiedProps function, these two configurations should be hot loaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# will be used.
# effectiveMode: hot_reload
# Datatype: long
cte_buffer_size_in_bytes=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cte_buffer_size_in_bytes=0
# Privilege: SYSTEM
# unit: byte
cte_buffer_size_in_bytes=131072

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# Max rows for CTE materialization
# effectiveMode: hot_reload
# Datatype: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Datatype: int
# Datatype: int
# Privilege: SYSTEM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +146 to +164
if (drivers == null || drivers.isEmpty()) {
return false;
}
// Check if any driver contains ExplainAnalyzeOperator
return drivers.stream()
.anyMatch(
driver ->
driver.getDriverContext().getOperatorContexts().stream()
.anyMatch(
operatorContext ->
ExplainAnalyzeOperator.class
.getSimpleName()
.equals(operatorContext.getOperatorType())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be inited as a field while generating this FragmentInstanceExecution in LocalExecutionPlanner instead of doing this judgement each time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

dataReader.close();
}
} catch (Exception e) {
LOGGER.error("Fail to close fileChannel", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOGGER.error("Fail to close fileChannel", e);
LOGGER.error("Fail to close CteDataReader", e);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(tableRef, query) -> {
Table table = tableRef.getNode();
if (query.isMaterialized()) {
if (!query.isDone()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may need another if to check whether this cte has already failed last time? if so we can just return even if this cte is not done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought even if it failed last time, maybe it's worthy try another time in case it's network problem.

Comment on lines 771 to 772
PredicateWithUncorrelatedScalarSubqueryReconstructor.getInstance()
.clearShadowExpression(predicate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't clear shadow expression here, the predicate expression is still being used afterwards. We can only clear them while front end finished.


import org.apache.tsfile.read.common.block.TsBlock;

public interface CteDataReader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public interface CteDataReader {
public interface CteDataReader extends Accountable {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*
* @return the bytes used by this CteDataReader
*/
long bytesUsed();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
long bytesUsed();
long ramBytesUsed();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@Override
public long bytesUsed() {
return INSTANCE_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle the memory footprint right.

shizy818 and others added 11 commits December 26, 2025 14:28
# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/common/MPPQueryContext.java

# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/node/PlanNodeType.java

# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/node/PlanNodeType.java
# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/common/MPPQueryContext.java
# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/Coordinator.java

# Conflicts:
#	iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/Coordinator.java
@shizy818 shizy818 force-pushed the with_clause_enhance branch from 4eb6aac to fe47b8f Compare December 26, 2025 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants