[refactor](paimon) Per-catalog Paimon metadata cache with two-level table+snapshot structure #60478

suxiaogang223 · 2026-02-03T09:52:13Z

What problem does this PR solve?

Summary

Refactor Paimon metadata cache from a single global instance to per-catalog instances,
introduce a two-level Table+Snapshot cache structure, and unify TTL resolution logic
across Iceberg/Paimon/Schema caches.

Motivation

The previous design shared a single PaimonMetadataCache instance and a single global
snapshotCache across all Paimon catalogs. This caused:

Different catalogs could not independently configure cache TTL.
Cache keys had to carry catalogId for isolation; invalidation required scanning all
keys and filtering.
PaimonExternalTable eagerly fetched the Table object at construction time, incurring
remote calls even when the table was never subsequently accessed.

Changes

Per-catalog cache instantiation

Introduce CatalogScopedCacheMgr<T>, a generic catalog-keyed cache manager backed by
ConcurrentHashMap.computeIfAbsent.
PaimonMetadataCacheMgr now extends CatalogScopedCacheMgr<PaimonMetadataCache>;
each catalog owns an independent PaimonMetadataCache.
Migrate Iceberg's icebergCacheMap (previously hand-rolled double-checked locking)
in ExternalMetaCacheMgr to the same CatalogScopedCacheMgr.

Two-level table + snapshot cache

Replace the single snapshotCache with tableCache
(key: PaimonTableCacheKey, value: PaimonTableCacheValue).
PaimonTableCacheValue holds the Paimon Table object and lazy-loads
PaimonSnapshotCacheValue via double-checked locking.
- The Table object is managed by Caffeine LoadingCache, subject to TTL/maxSize.
  When TTL expires, Caffeine creates a new PaimonTableCacheValue, and the snapshot
  is re-lazily-loaded on next access.
- Normal queries hit tableCache directly; MTMV queries go through the explicit
  snapshot path; branch queries call the Paimon catalog directly (branches have
  independent schemas, not suitable for the main table cache).

Unified TTL resolution

Extract ExternalCatalog.resolveCacheTtlSpec() to centralize TTL property parsing:
- null → use global default (external_cache_expire_time_seconds_after_access)
- -1 → no expiry (Caffeine does not set expireAfterAccess)
- 0 → disable cache (maxSize=0, Caffeine evicts immediately)
- >0 → use as expireAfterAccess
Applied uniformly to IcebergMetadataCache, PaimonMetadataCache, and
ExternalSchemaCache.
Add paimon.table.meta.cache.ttl-second catalog property with validation in
checkProperties(). ALTER CATALOG SET triggers cache reinitialization via
notifyPropertiesUpdated(), consistent with Iceberg's existing pattern.

Lazy table access in PaimonExternalTable

Remove the eagerly-loaded paimonTable field from the constructor.
All Table object access now goes through PaimonUtils →
PaimonMetadataCache.tableCache, making it lazy and cache-aware.
Introduce PaimonUtils as the centralized static accessor for Paimon cache
operations, simplifying call sites.

Iceberg invalidation fast path

IcebergMetadataCache.invalidateTableCache() now attempts a direct key lookup
(getIfPresent) first. On hit, invalidate immediately; on miss, fall back to full
cache scan matching by local name. Avoids unnecessary iteration on the common path.

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-02-03T09:52:21Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

suxiaogang223 · 2026-02-03T11:10:14Z

run external

hello-stephen · 2026-02-03T14:40:15Z

FE Regression Coverage Report

Increment line coverage 76.32% (145/190) 🎉
Increment coverage report
Complete coverage report

suxiaogang223 · 2026-02-04T10:52:23Z

run external

hello-stephen · 2026-02-04T14:08:41Z

FE Regression Coverage Report

Increment line coverage 79.93% (235/294) 🎉
Increment coverage report
Complete coverage report

fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalCatalog.java

suxiaogang223 · 2026-02-10T15:41:45Z

run buildall

doris-robot · 2026-02-10T16:18:19Z

TPC-H: Total hot run time: 29989 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 78e8af4fe701c1a8add7bb8d5da9b23b3a4c8098, data reload: false

------ Round 1 ----------------------------------
q1	17621	4427	4269	4269
q2	2059	353	230	230
q3	10157	1277	734	734
q4	10189	765	314	314
q5	7564	2201	1934	1934
q6	191	176	145	145
q7	874	750	596	596
q8	9267	1385	1073	1073
q9	4715	4617	4577	4577
q10	6777	1934	1535	1535
q11	470	277	245	245
q12	340	373	231	231
q13	17791	4076	3223	3223
q14	238	230	225	225
q15	912	809	793	793
q16	684	679	625	625
q17	695	856	492	492
q18	6450	5939	5694	5694
q19	1227	999	622	622
q20	514	506	391	391
q21	2522	1809	1806	1806
q22	326	277	235	235
Total cold run time: 101583 ms
Total hot run time: 29989 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4347	4356	4343	4343
q2	254	345	259	259
q3	2032	2708	2235	2235
q4	1393	1729	1282	1282
q5	4293	4249	4467	4249
q6	199	176	141	141
q7	1832	1793	1708	1708
q8	2474	2762	2440	2440
q9	7651	7640	7396	7396
q10	2872	3025	2597	2597
q11	516	446	415	415
q12	693	744	599	599
q13	3953	4351	3666	3666
q14	336	323	274	274
q15	860	798	860	798
q16	692	758	694	694
q17	1187	1445	1394	1394
q18	8445	7924	7803	7803
q19	936	876	887	876
q20	2058	2149	2226	2149
q21	4858	4348	4211	4211
q22	484	444	421	421
Total cold run time: 52365 ms
Total hot run time: 49950 ms

doris-robot · 2026-02-10T16:35:00Z

ClickBench: Total hot run time: 28.18 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 78e8af4fe701c1a8add7bb8d5da9b23b3a4c8098, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.05
query3	0.25	0.08	0.08
query4	1.60	0.11	0.11
query5	0.27	0.25	0.27
query6	1.16	0.66	0.66
query7	0.03	0.03	0.02
query8	0.05	0.04	0.04
query9	0.57	0.50	0.50
query10	0.55	0.55	0.54
query11	0.14	0.09	0.09
query12	0.14	0.10	0.11
query13	0.63	0.61	0.63
query14	1.07	1.06	1.05
query15	0.86	0.86	0.89
query16	0.39	0.40	0.38
query17	1.20	1.07	1.15
query18	0.23	0.21	0.21
query19	2.05	1.98	2.05
query20	0.02	0.02	0.02
query21	15.45	0.28	0.15
query22	5.06	0.05	0.06
query23	15.85	0.29	0.12
query24	2.40	0.66	0.18
query25	0.08	0.08	0.09
query26	0.14	0.14	0.13
query27	0.08	0.07	0.06
query28	4.43	1.15	0.97
query29	12.57	3.91	3.16
query30	0.28	0.14	0.12
query31	2.82	0.65	0.40
query32	3.23	0.59	0.49
query33	3.32	3.29	3.21
query34	16.32	5.43	4.76
query35	4.82	4.77	4.80
query36	0.66	0.49	0.48
query37	0.12	0.08	0.07
query38	0.08	0.05	0.04
query39	0.05	0.03	0.03
query40	0.20	0.16	0.16
query41	0.08	0.04	0.03
query42	0.04	0.02	0.02
query43	0.05	0.04	0.03
Total cold run time: 99.49 s
Total hot run time: 28.18 s

hello-stephen · 2026-02-10T20:06:37Z

FE Regression Coverage Report

Increment line coverage 52.63% (150/285) 🎉
Increment coverage report
Complete coverage report

github-actions · 2026-02-11T01:51:54Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-02-11T01:51:56Z

PR approved by anyone and no changes requested.

suxiaogang223 added 7 commits February 3, 2026 15:10

refact paimon meta cache

1d4f64a

Use meta cache for PaimonExternalTable

2925c76

Add PaimonUtils cache access

82d404b

Unify catalog-scoped metadata cache managers

01b85cf

Optimize table cache invalidation

db90c23

Unify external cache TTL semantics

bed712a

Add paimon meta cache regression test

85523f0

fix

3ae5e47

suxiaogang223 added 3 commits February 4, 2026 15:23

Refactor external meta cache managers

121db66

Unify external cache spec parsing

e2e548a

default enable io manifest cache if the meta.cache.manifest is enabled

e802d99

morningman added the dev/4.1.x label Feb 4, 2026

morningman reviewed Feb 10, 2026

View reviewed changes

fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalCatalog.java Outdated Show resolved Hide resolved

morningman added dev/4.0.x and removed dev/4.1.x labels Feb 10, 2026

Unify schema cache TTL handling with CacheSpec

78e8af4

morningman approved these changes Feb 11, 2026

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 11, 2026

github-actions bot added the reviewed label Feb 11, 2026

CalvinKirs approved these changes Feb 11, 2026

View reviewed changes

morningman merged commit 2c85148 into apache:master Feb 11, 2026
30 checks passed

morningman added the kind/need-document label Feb 11, 2026

github-actions bot added the dev/4.0.x-conflict label Feb 11, 2026

suxiaogang223 mentioned this pull request Feb 11, 2026

[Refactor][Tracking] Unified external table metadata cache framework #60686

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor](paimon) Per-catalog Paimon metadata cache with two-level table+snapshot structure #60478

[refactor](paimon) Per-catalog Paimon metadata cache with two-level table+snapshot structure #60478

Uh oh!

suxiaogang223 commented Feb 3, 2026

Uh oh!

hello-stephen commented Feb 3, 2026

Uh oh!

suxiaogang223 commented Feb 3, 2026

Uh oh!

hello-stephen commented Feb 3, 2026

Uh oh!

suxiaogang223 commented Feb 4, 2026

Uh oh!

hello-stephen commented Feb 4, 2026

Uh oh!

Uh oh!

suxiaogang223 commented Feb 10, 2026

Uh oh!

doris-robot commented Feb 10, 2026

Uh oh!

doris-robot commented Feb 10, 2026

Uh oh!

hello-stephen commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[refactor](paimon) Per-catalog Paimon metadata cache with two-level table+snapshot structure #60478

[refactor](paimon) Per-catalog Paimon metadata cache with two-level table+snapshot structure #60478

Uh oh!

Conversation

suxiaogang223 commented Feb 3, 2026

What problem does this PR solve?

Summary

Motivation

Changes

Per-catalog cache instantiation

Two-level table + snapshot cache

Unified TTL resolution

Lazy table access in PaimonExternalTable

Iceberg invalidation fast path

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Feb 3, 2026

Uh oh!

suxiaogang223 commented Feb 3, 2026

Uh oh!

hello-stephen commented Feb 3, 2026

FE Regression Coverage Report

Uh oh!

suxiaogang223 commented Feb 4, 2026

Uh oh!

hello-stephen commented Feb 4, 2026

FE Regression Coverage Report

Uh oh!

Uh oh!

suxiaogang223 commented Feb 10, 2026

Uh oh!

doris-robot commented Feb 10, 2026

Uh oh!

doris-robot commented Feb 10, 2026

Uh oh!

hello-stephen commented Feb 10, 2026

FE Regression Coverage Report

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants