Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

What problem does this PR solve?

Summary

Refactor Paimon metadata cache from a single global instance to per-catalog instances,
introduce a two-level Table+Snapshot cache structure, and unify TTL resolution logic
across Iceberg/Paimon/Schema caches.

Motivation

The previous design shared a single PaimonMetadataCache instance and a single global
snapshotCache across all Paimon catalogs. This caused:

  • Different catalogs could not independently configure cache TTL.
  • Cache keys had to carry catalogId for isolation; invalidation required scanning all
    keys and filtering.
  • PaimonExternalTable eagerly fetched the Table object at construction time, incurring
    remote calls even when the table was never subsequently accessed.

Changes

Per-catalog cache instantiation

  • Introduce CatalogScopedCacheMgr<T>, a generic catalog-keyed cache manager backed by
    ConcurrentHashMap.computeIfAbsent.
  • PaimonMetadataCacheMgr now extends CatalogScopedCacheMgr<PaimonMetadataCache>;
    each catalog owns an independent PaimonMetadataCache.
  • Migrate Iceberg's icebergCacheMap (previously hand-rolled double-checked locking)
    in ExternalMetaCacheMgr to the same CatalogScopedCacheMgr.

Two-level table + snapshot cache

  • Replace the single snapshotCache with tableCache
    (key: PaimonTableCacheKey, value: PaimonTableCacheValue).
  • PaimonTableCacheValue holds the Paimon Table object and lazy-loads
    PaimonSnapshotCacheValue via double-checked locking.
    • The Table object is managed by Caffeine LoadingCache, subject to TTL/maxSize.
      When TTL expires, Caffeine creates a new PaimonTableCacheValue, and the snapshot
      is re-lazily-loaded on next access.
    • Normal queries hit tableCache directly; MTMV queries go through the explicit
      snapshot path; branch queries call the Paimon catalog directly (branches have
      independent schemas, not suitable for the main table cache).

Unified TTL resolution

  • Extract ExternalCatalog.resolveCacheTtlSpec() to centralize TTL property parsing:
    • null → use global default (external_cache_expire_time_seconds_after_access)
    • -1 → no expiry (Caffeine does not set expireAfterAccess)
    • 0 → disable cache (maxSize=0, Caffeine evicts immediately)
    • >0 → use as expireAfterAccess
  • Applied uniformly to IcebergMetadataCache, PaimonMetadataCache, and
    ExternalSchemaCache.
  • Add paimon.table.meta.cache.ttl-second catalog property with validation in
    checkProperties(). ALTER CATALOG SET triggers cache reinitialization via
    notifyPropertiesUpdated(), consistent with Iceberg's existing pattern.

Lazy table access in PaimonExternalTable

  • Remove the eagerly-loaded paimonTable field from the constructor.
  • All Table object access now goes through PaimonUtils
    PaimonMetadataCache.tableCache, making it lazy and cache-aware.
  • Introduce PaimonUtils as the centralized static accessor for Paimon cache
    operations, simplifying call sites.

Iceberg invalidation fast path

  • IcebergMetadataCache.invalidateTableCache() now attempts a direct key lookup
    (getIfPresent) first. On hit, invalidate immediately; on miss, fall back to full
    cache scan matching by local name. Avoids unnecessary iteration on the common path.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run external

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 76.32% (145/190) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run external

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 79.93% (235/294) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 29989 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 78e8af4fe701c1a8add7bb8d5da9b23b3a4c8098, data reload: false

------ Round 1 ----------------------------------
q1	17621	4427	4269	4269
q2	2059	353	230	230
q3	10157	1277	734	734
q4	10189	765	314	314
q5	7564	2201	1934	1934
q6	191	176	145	145
q7	874	750	596	596
q8	9267	1385	1073	1073
q9	4715	4617	4577	4577
q10	6777	1934	1535	1535
q11	470	277	245	245
q12	340	373	231	231
q13	17791	4076	3223	3223
q14	238	230	225	225
q15	912	809	793	793
q16	684	679	625	625
q17	695	856	492	492
q18	6450	5939	5694	5694
q19	1227	999	622	622
q20	514	506	391	391
q21	2522	1809	1806	1806
q22	326	277	235	235
Total cold run time: 101583 ms
Total hot run time: 29989 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4347	4356	4343	4343
q2	254	345	259	259
q3	2032	2708	2235	2235
q4	1393	1729	1282	1282
q5	4293	4249	4467	4249
q6	199	176	141	141
q7	1832	1793	1708	1708
q8	2474	2762	2440	2440
q9	7651	7640	7396	7396
q10	2872	3025	2597	2597
q11	516	446	415	415
q12	693	744	599	599
q13	3953	4351	3666	3666
q14	336	323	274	274
q15	860	798	860	798
q16	692	758	694	694
q17	1187	1445	1394	1394
q18	8445	7924	7803	7803
q19	936	876	887	876
q20	2058	2149	2226	2149
q21	4858	4348	4211	4211
q22	484	444	421	421
Total cold run time: 52365 ms
Total hot run time: 49950 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.18 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 78e8af4fe701c1a8add7bb8d5da9b23b3a4c8098, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.05
query3	0.25	0.08	0.08
query4	1.60	0.11	0.11
query5	0.27	0.25	0.27
query6	1.16	0.66	0.66
query7	0.03	0.03	0.02
query8	0.05	0.04	0.04
query9	0.57	0.50	0.50
query10	0.55	0.55	0.54
query11	0.14	0.09	0.09
query12	0.14	0.10	0.11
query13	0.63	0.61	0.63
query14	1.07	1.06	1.05
query15	0.86	0.86	0.89
query16	0.39	0.40	0.38
query17	1.20	1.07	1.15
query18	0.23	0.21	0.21
query19	2.05	1.98	2.05
query20	0.02	0.02	0.02
query21	15.45	0.28	0.15
query22	5.06	0.05	0.06
query23	15.85	0.29	0.12
query24	2.40	0.66	0.18
query25	0.08	0.08	0.09
query26	0.14	0.14	0.13
query27	0.08	0.07	0.06
query28	4.43	1.15	0.97
query29	12.57	3.91	3.16
query30	0.28	0.14	0.12
query31	2.82	0.65	0.40
query32	3.23	0.59	0.49
query33	3.32	3.29	3.21
query34	16.32	5.43	4.76
query35	4.82	4.77	4.80
query36	0.66	0.49	0.48
query37	0.12	0.08	0.07
query38	0.08	0.05	0.04
query39	0.05	0.03	0.03
query40	0.20	0.16	0.16
query41	0.08	0.04	0.03
query42	0.04	0.02	0.02
query43	0.05	0.04	0.03
Total cold run time: 99.49 s
Total hot run time: 28.18 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 52.63% (150/285) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 11, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 2c85148 into apache:master Feb 11, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants