Skip to content

[fix](parquet)fix parquet reader lazy materialization cannot filter.#60474

Open
hubgeter wants to merge 2 commits intoapache:masterfrom
hubgeter:fix_parquet_lazy_mat
Open

[fix](parquet)fix parquet reader lazy materialization cannot filter.#60474
hubgeter wants to merge 2 commits intoapache:masterfrom
hubgeter:fix_parquet_lazy_mat

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Feb 3, 2026

What problem does this PR solve?

Related PR: #60197
Problem Summary:

This fix Parquet reader lazy materialization invalid issue in PR #60197 caused by the removal of feature #59053.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hubgeter
Copy link
Contributor Author

hubgeter commented Feb 3, 2026

run buildall

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 32903 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a7121de8611f8b463ed69a18635a85aaab1840b2, data reload: false

------ Round 1 ----------------------------------
q1	17701	5320	5076	5076
q2	2067	310	198	198
q3	10210	1294	805	805
q4	10201	838	318	318
q5	8063	2224	1911	1911
q6	212	182	148	148
q7	869	756	629	629
q8	9297	1436	1299	1299
q9	5280	4864	4835	4835
q10	6854	1959	1562	1562
q11	509	283	274	274
q12	399	384	224	224
q13	17821	4144	3204	3204
q14	235	248	216	216
q15	925	831	810	810
q16	699	675	629	629
q17	764	789	514	514
q18	6751	6792	7078	6792
q19	1425	1095	770	770
q20	425	386	244	244
q21	3164	2415	2154	2154
q22	372	327	291	291
Total cold run time: 104243 ms
Total hot run time: 32903 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5591	5559	5565	5559
q2	286	351	267	267
q3	2361	2834	2534	2534
q4	1431	1942	1437	1437
q5	4761	4573	4514	4514
q6	236	188	139	139
q7	2071	1918	1899	1899
q8	2543	2430	2400	2400
q9	7504	7524	7572	7524
q10	2767	2803	2417	2417
q11	538	449	433	433
q12	636	686	548	548
q13	3591	4031	3202	3202
q14	265	301	276	276
q15	844	808	793	793
q16	649	690	641	641
q17	1096	1275	1394	1275
q18	7401	7219	7326	7219
q19	899	853	820	820
q20	1966	2068	1868	1868
q21	4584	4319	4062	4062
q22	588	536	531	531
Total cold run time: 52608 ms
Total hot run time: 50358 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.66 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a7121de8611f8b463ed69a18635a85aaab1840b2, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.05	0.05
query3	0.26	0.08	0.09
query4	1.61	0.11	0.11
query5	0.27	0.25	0.25
query6	1.18	0.71	0.67
query7	0.03	0.03	0.03
query8	0.05	0.04	0.04
query9	0.57	0.51	0.49
query10	0.53	0.54	0.56
query11	0.14	0.10	0.09
query12	0.14	0.10	0.10
query13	0.63	0.62	0.62
query14	1.08	1.06	1.05
query15	0.88	0.86	0.87
query16	0.39	0.40	0.40
query17	1.14	1.11	1.15
query18	0.23	0.23	0.22
query19	2.09	1.93	1.96
query20	0.02	0.02	0.02
query21	15.41	0.29	0.14
query22	5.01	0.06	0.06
query23	15.71	0.29	0.10
query24	0.98	0.69	0.57
query25	0.13	0.06	0.06
query26	0.15	0.15	0.14
query27	0.06	0.05	0.09
query28	4.98	1.18	0.96
query29	12.57	3.96	3.24
query30	0.28	0.14	0.11
query31	2.81	0.64	0.41
query32	3.23	0.60	0.50
query33	3.21	3.25	3.24
query34	16.17	5.38	4.69
query35	4.79	4.85	4.80
query36	0.68	0.51	0.49
query37	0.11	0.07	0.07
query38	0.08	0.04	0.04
query39	0.05	0.04	0.03
query40	0.19	0.16	0.16
query41	0.10	0.03	0.02
query42	0.05	0.03	0.03
query43	0.06	0.04	0.04
Total cold run time: 98.19 s
Total hot run time: 28.66 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.58% (19372/36840)
Line Coverage 36.05% (179936/499125)
Region Coverage 32.44% (139601/430372)
Branch Coverage 33.41% (60407/180816)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.35% (26481/36104)
Line Coverage 56.51% (281375/497927)
Region Coverage 54.19% (235612/434786)
Branch Coverage 55.91% (101505/181544)

@hubgeter
Copy link
Contributor Author

hubgeter commented Feb 4, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31660 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f2658fc32571a48f291d61f3ed35527c2366fd30, data reload: false

------ Round 1 ----------------------------------
q1	17634	5272	5049	5049
q2	1976	339	198	198
q3	10224	1300	751	751
q4	10216	877	310	310
q5	7532	2164	1905	1905
q6	195	186	150	150
q7	887	718	597	597
q8	9255	1423	1045	1045
q9	5155	4813	4813	4813
q10	6816	1948	1555	1555
q11	532	287	268	268
q12	333	376	226	226
q13	17788	4052	3257	3257
q14	247	246	218	218
q15	945	830	814	814
q16	660	659	629	629
q17	633	784	502	502
q18	6635	6438	6394	6394
q19	1222	1021	608	608
q20	392	340	240	240
q21	2546	1968	1861	1861
q22	349	303	270	270
Total cold run time: 102172 ms
Total hot run time: 31660 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5298	5276	5274	5274
q2	257	340	265	265
q3	2245	2752	2409	2409
q4	1384	1753	1321	1321
q5	4369	4304	4497	4304
q6	213	178	142	142
q7	2181	2221	1932	1932
q8	2605	2498	2441	2441
q9	7598	7461	7629	7461
q10	2845	3005	2615	2615
q11	565	478	431	431
q12	659	792	600	600
q13	4009	4448	3487	3487
q14	284	323	302	302
q15	891	807	942	807
q16	763	713	671	671
q17	1123	1367	1349	1349
q18	8163	7832	8216	7832
q19	862	792	783	783
q20	2120	2216	2119	2119
q21	4609	4235	4109	4109
q22	531	522	502	502
Total cold run time: 53574 ms
Total hot run time: 51156 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.07 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f2658fc32571a48f291d61f3ed35527c2366fd30, data reload: false

query1	0.05	0.04	0.04
query2	0.10	0.05	0.05
query3	0.25	0.08	0.09
query4	1.60	0.10	0.11
query5	0.27	0.24	0.23
query6	1.16	0.68	0.67
query7	0.03	0.02	0.03
query8	0.05	0.04	0.04
query9	0.58	0.49	0.49
query10	0.55	0.54	0.53
query11	0.13	0.09	0.10
query12	0.14	0.10	0.11
query13	0.62	0.61	0.61
query14	1.06	1.06	1.05
query15	0.87	0.86	0.88
query16	0.38	0.39	0.39
query17	1.19	1.16	1.10
query18	0.23	0.21	0.20
query19	2.08	2.03	1.94
query20	0.02	0.02	0.01
query21	15.40	0.27	0.15
query22	5.10	0.06	0.05
query23	15.86	0.29	0.11
query24	1.42	0.24	0.50
query25	0.11	0.09	0.06
query26	0.14	0.14	0.13
query27	0.09	0.06	0.05
query28	4.35	1.19	0.97
query29	12.60	3.86	3.13
query30	0.29	0.14	0.11
query31	2.81	0.66	0.40
query32	3.23	0.60	0.49
query33	3.21	3.24	3.28
query34	15.86	5.40	4.72
query35	4.83	4.86	4.76
query36	0.65	0.50	0.49
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.20	0.16	0.15
query41	0.09	0.03	0.04
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 97.91 s
Total hot run time: 28.07 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.56% (19370/36854)
Line Coverage 36.05% (179964/499184)
Region Coverage 32.42% (139547/430399)
Branch Coverage 33.43% (60434/180794)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.36% (26496/36118)
Line Coverage 56.53% (281515/497989)
Region Coverage 54.33% (236236/434815)
Branch Coverage 55.95% (101561/181524)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants