-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Distributed Search Indexing with Push Notifications #24939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
TypeScript types have been updated based on the JSON schema changes in the PR |
🛡️ TRIVY SCAN RESULT 🛡️ Target:
|
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.12.7 | 2.15.0 |
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.13.4 | 2.15.0 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42003 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4.2 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42004 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4 |
com.google.code.gson:gson |
CVE-2022-25647 | 🚨 HIGH | 2.2.4 | 2.8.9 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.3.0 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.3.0 | 3.25.5, 4.27.5, 4.28.2 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.7.1 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.7.1 | 3.25.5, 4.27.5, 4.28.2 |
com.nimbusds:nimbus-jose-jwt |
CVE-2023-52428 | 🚨 HIGH | 9.8.1 | 9.37.2 |
com.squareup.okhttp3:okhttp |
CVE-2021-0341 | 🚨 HIGH | 3.12.12 | 4.9.2 |
commons-beanutils:commons-beanutils |
CVE-2025-48734 | 🚨 HIGH | 1.9.4 | 1.11.0 |
commons-io:commons-io |
CVE-2024-47554 | 🚨 HIGH | 2.8.0 | 2.14.0 |
dnsjava:dnsjava |
CVE-2024-25638 | 🚨 HIGH | 2.1.7 | 3.6.0 |
io.netty:netty-codec-http2 |
CVE-2025-55163 | 🚨 HIGH | 4.1.96.Final | 4.2.4.Final, 4.1.124.Final |
io.netty:netty-codec-http2 |
GHSA-xpw8-rcwv-8f8p | 🚨 HIGH | 4.1.96.Final | 4.1.100.Final |
io.netty:netty-handler |
CVE-2025-24970 | 🚨 HIGH | 4.1.96.Final | 4.1.118.Final |
net.minidev:json-smart |
CVE-2021-31684 | 🚨 HIGH | 1.3.2 | 1.3.3, 2.4.4 |
net.minidev:json-smart |
CVE-2023-1370 | 🚨 HIGH | 1.3.2 | 2.4.9 |
org.apache.avro:avro |
CVE-2024-47561 | 🔥 CRITICAL | 1.7.7 | 1.11.4 |
org.apache.avro:avro |
CVE-2023-39410 | 🚨 HIGH | 1.7.7 | 1.11.3 |
org.apache.derby:derby |
CVE-2022-46337 | 🔥 CRITICAL | 10.14.2.0 | 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0 |
org.apache.ivy:ivy |
CVE-2022-46751 | 🚨 HIGH | 2.5.1 | 2.5.2 |
org.apache.mesos:mesos |
CVE-2018-1330 | 🚨 HIGH | 1.4.3 | 1.6.0 |
org.apache.thrift:libthrift |
CVE-2019-0205 | 🚨 HIGH | 0.12.0 | 0.13.0 |
org.apache.thrift:libthrift |
CVE-2020-13949 | 🚨 HIGH | 0.12.0 | 0.14.0 |
org.apache.zookeeper:zookeeper |
CVE-2023-44981 | 🔥 CRITICAL | 3.6.3 | 3.7.2, 3.8.3, 3.9.1 |
org.eclipse.jetty:jetty-server |
CVE-2024-13009 | 🚨 HIGH | 9.4.56.v20240826 | 9.4.57.v20241219 |
org.lz4:lz4-java |
CVE-2025-12183 | 🚨 HIGH | 1.8.0 | 1.8.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Node.js
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: Python
Vulnerabilities (4)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
starlette |
CVE-2025-62727 | 🚨 HIGH | 0.48.0 | 0.49.1 |
urllib3 |
CVE-2025-66418 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2025-66471 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2026-21441 | 🚨 HIGH | 1.26.20 | 2.6.3 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: /etc/ssl/private/ssl-cert-snakeoil.key
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/extended_sample_data.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/lineage.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data.json
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_data_aut.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage.json
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /ingestion/pipelines/sample_usage_aut.yaml
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️ Target:
|
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.12.7 | 2.15.0 |
com.fasterxml.jackson.core:jackson-core |
CVE-2025-52999 | 🚨 HIGH | 2.13.4 | 2.15.0 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42003 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4.2 |
com.fasterxml.jackson.core:jackson-databind |
CVE-2022-42004 | 🚨 HIGH | 2.12.7 | 2.12.7.1, 2.13.4 |
com.google.code.gson:gson |
CVE-2022-25647 | 🚨 HIGH | 2.2.4 | 2.8.9 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.3.0 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.3.0 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.3.0 | 3.25.5, 4.27.5, 4.28.2 |
com.google.protobuf:protobuf-java |
CVE-2021-22569 | 🚨 HIGH | 3.7.1 | 3.16.1, 3.18.2, 3.19.2 |
com.google.protobuf:protobuf-java |
CVE-2022-3509 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2022-3510 | 🚨 HIGH | 3.7.1 | 3.16.3, 3.19.6, 3.20.3, 3.21.7 |
com.google.protobuf:protobuf-java |
CVE-2024-7254 | 🚨 HIGH | 3.7.1 | 3.25.5, 4.27.5, 4.28.2 |
com.nimbusds:nimbus-jose-jwt |
CVE-2023-52428 | 🚨 HIGH | 9.8.1 | 9.37.2 |
com.squareup.okhttp3:okhttp |
CVE-2021-0341 | 🚨 HIGH | 3.12.12 | 4.9.2 |
commons-beanutils:commons-beanutils |
CVE-2025-48734 | 🚨 HIGH | 1.9.4 | 1.11.0 |
commons-io:commons-io |
CVE-2024-47554 | 🚨 HIGH | 2.8.0 | 2.14.0 |
dnsjava:dnsjava |
CVE-2024-25638 | 🚨 HIGH | 2.1.7 | 3.6.0 |
io.netty:netty-codec-http2 |
CVE-2025-55163 | 🚨 HIGH | 4.1.96.Final | 4.2.4.Final, 4.1.124.Final |
io.netty:netty-codec-http2 |
GHSA-xpw8-rcwv-8f8p | 🚨 HIGH | 4.1.96.Final | 4.1.100.Final |
io.netty:netty-handler |
CVE-2025-24970 | 🚨 HIGH | 4.1.96.Final | 4.1.118.Final |
net.minidev:json-smart |
CVE-2021-31684 | 🚨 HIGH | 1.3.2 | 1.3.3, 2.4.4 |
net.minidev:json-smart |
CVE-2023-1370 | 🚨 HIGH | 1.3.2 | 2.4.9 |
org.apache.avro:avro |
CVE-2024-47561 | 🔥 CRITICAL | 1.7.7 | 1.11.4 |
org.apache.avro:avro |
CVE-2023-39410 | 🚨 HIGH | 1.7.7 | 1.11.3 |
org.apache.derby:derby |
CVE-2022-46337 | 🔥 CRITICAL | 10.14.2.0 | 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0 |
org.apache.ivy:ivy |
CVE-2022-46751 | 🚨 HIGH | 2.5.1 | 2.5.2 |
org.apache.mesos:mesos |
CVE-2018-1330 | 🚨 HIGH | 1.4.3 | 1.6.0 |
org.apache.thrift:libthrift |
CVE-2019-0205 | 🚨 HIGH | 0.12.0 | 0.13.0 |
org.apache.thrift:libthrift |
CVE-2020-13949 | 🚨 HIGH | 0.12.0 | 0.14.0 |
org.apache.zookeeper:zookeeper |
CVE-2023-44981 | 🔥 CRITICAL | 3.6.3 | 3.7.2, 3.8.3, 3.9.1 |
org.eclipse.jetty:jetty-server |
CVE-2024-13009 | 🚨 HIGH | 9.4.56.v20240826 | 9.4.57.v20241219 |
org.lz4:lz4-java |
CVE-2025-12183 | 🚨 HIGH | 1.8.0 | 1.8.1 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: Node.js
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: Python
Vulnerabilities (9)
| Package | Vulnerability ID | Severity | Installed Version | Fixed Version |
|---|---|---|---|---|
Werkzeug |
CVE-2024-34069 | 🚨 HIGH | 2.2.3 | 3.0.3 |
aiohttp |
CVE-2025-69223 | 🚨 HIGH | 3.12.12 | 3.13.3 |
aiohttp |
CVE-2025-69223 | 🚨 HIGH | 3.13.2 | 3.13.3 |
deepdiff |
CVE-2025-58367 | 🔥 CRITICAL | 7.0.1 | 8.6.1 |
ray |
CVE-2025-62593 | 🔥 CRITICAL | 2.47.1 | 2.52.0 |
starlette |
CVE-2025-62727 | 🚨 HIGH | 0.48.0 | 0.49.1 |
urllib3 |
CVE-2025-66418 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2025-66471 | 🚨 HIGH | 1.26.20 | 2.6.0 |
urllib3 |
CVE-2026-21441 | 🚨 HIGH | 1.26.20 | 2.6.3 |
🛡️ TRIVY SCAN RESULT 🛡️
Target: /etc/ssl/private/ssl-cert-snakeoil.key
No Vulnerabilities Found
🛡️ TRIVY SCAN RESULT 🛡️
Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO
No Vulnerabilities Found
| CountDownLatch completionLatch = new CountDownLatch(1); | ||
| ScheduledExecutorService monitor = | ||
| Executors.newSingleThreadScheduledExecutor( | ||
| Thread.ofPlatform().name("distributed-monitor").factory()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use virtual thread here
🔍 CI failure analysis for 95cffb0: Integration tests fail due to distributed indexing job state persistence (RELATED to PR). Playwright tests fail with browser closures and element visibility issues (UNRELATED to PR).IssueMultiple CI job failures:
Root Cause Analysis1. Integration Test Failures (RELATED)Concurrency Control Issue in Distributed Search Indexing This failure is directly related to the distributed search indexing changes. The test Why This Happens: The refactored code (commit 95cffb0 "Refactor Code for Single Server and Multiple Server") introduced database-backed job state tracking:
Files modified affecting this:
Consistency: Both PostgreSQL/OpenSearch and MySQL/Elasticsearch exhibit the same failure, confirming this is deterministic. 2. Playwright Test Failures (UNRELATED)Failures in job 59770462844 (1 test):
Failures in job 59770462842 (3 tests):
Error Patterns:
Analysis: These Playwright failures are UNRELATED to the distributed search indexing PR:
DetailsIntegration Test Impact: Test isolation problem - distributed job state persists in database across test runs, causing subsequent triggers to fail. Playwright Impact: These are likely pre-existing flaky tests or infrastructure issues unrelated to this PR's backend changes. Additional Context: The Playwright failures show classic symptoms of UI test flakiness (timing issues, browser state management) rather than functional regressions from backend changes. Code Review 👍 Approved with suggestionsWell-architected distributed search indexing system with comprehensive test coverage. Two previous minor concerns remain valid regarding the check-then-act in-flight counting and inconsistent server ID generation approaches.
|
| Auto-apply | Compact |
|
|
Was this helpful? React with 👍 / 👎 | This comment will update automatically (Docs)
|
|



Distributed Search Indexing - Design Document
Overview
This document describes the distributed search indexing capability in OpenMetadata, enabling multiple server instances to collaboratively process large-scale indexing jobs. This is essential for deployments with millions of entities where single-server indexing becomes a bottleneck.
Solution Architecture
High-Level Design
flowchart TB subgraph Cluster["OpenMetadata Cluster"] S1["Server 1<br/>(Triggering Server)"] S2["Server 2<br/>(Participant)"] SN["Server N<br/>(Participant)"] end subgraph Notification["Job Notification Layer"] direction LR Redis["Redis Pub/Sub<br/>(instant notification)"] Polling["Database Polling<br/>(30s fallback)"] end subgraph Persistence["Persistence Layer"] DB[(MySQL/PostgreSQL<br/>Jobs + Partitions)] ES[(Elasticsearch/<br/>OpenSearch)] end S1 -->|"1. Create job"| DB S1 -->|"2. Notify start"| Redis Redis -->|"3. Broadcast"| S2 Redis -->|"3. Broadcast"| SN S1 & S2 & SN -->|"4. Claim partitions<br/>(atomic)"| DB S1 & S2 & SN -->|"5. Index entities"| ES S1 & S2 & SN -->|"6. Update stats"| DBComponent Architecture
flowchart TB subgraph Executor["DistributedSearchIndexExecutor"] E1["Triggered by Quartz<br/>(on ONE server via JDBC clustering)"] E2["Creates job and partitions"] E3["Notifies other servers"] E4["Processes partitions"] E5["Aggregates results"] end subgraph Notifier["DistributedJobNotifier<br/><<interface>>"] N1["notifyJobStarted()"] N2["notifyJobCompleted()"] N3["onJobStarted(callback)"] end subgraph Implementations["Notifier Implementations"] R["RedisJobNotifier<br/>- Zero latency<br/>- Pub/Sub channels"] P["PollingJobNotifier<br/>- 30s idle interval<br/>- 1s active interval"] end subgraph Participant["DistributedJobParticipant"] P1["Runs on ALL servers<br/>(Dropwizard Managed)"] P2["Listens for notifications"] P3["Claims & processes partitions"] end subgraph Coordinator["DistributedSearchIndexCoordinator"] C1["Manages job lifecycle"] C2["Atomic partition claiming<br/>(FOR UPDATE SKIP LOCKED)"] C3["Stats aggregation"] end Executor --> Notifier Notifier --> R Notifier --> P Participant --> Notifier Participant --> Coordinator Executor --> CoordinatorJob Execution Flow
Sequence Diagram
sequenceDiagram participant Q as Quartz Scheduler participant S1 as Server 1 (Trigger) participant N as Notifier participant S2 as Server 2 participant DB as Database participant ES as Elasticsearch Q->>S1: Fire trigger (JDBC clustering ensures single server) rect rgb(240, 248, 255) Note over S1,DB: Job Initialization S1->>DB: Create job record (status=RUNNING) S1->>DB: Create partition records (status=PENDING) S1->>N: notifyJobStarted(jobId) end N->>S2: Broadcast job notification S2->>DB: Get job details S2->>DB: Check pending partitions rect rgb(240, 255, 240) Note over S1,ES: Parallel Processing par Server 1 Processing loop While partitions available S1->>DB: Claim partition (FOR UPDATE SKIP LOCKED) S1->>DB: Read entities for partition range S1->>ES: Bulk index entities S1->>DB: Update partition stats end and Server 2 Processing loop While partitions available S2->>DB: Claim partition (FOR UPDATE SKIP LOCKED) S2->>DB: Read entities for partition range S2->>ES: Bulk index entities S2->>DB: Update partition stats end end end rect rgb(255, 248, 240) Note over S1,DB: Job Completion S1->>DB: Aggregate stats from all partitions S1->>DB: Update job status (COMPLETED) S1->>N: notifyJobCompleted(jobId) endPartition-Based Work Distribution
Partition Creation Strategy
flowchart TB subgraph Input["Input Parameters"] I1["Entity types: [table, dashboard, pipeline]"] I2["Batch size: 1000"] I3["Total records: 100,000 tables"] end subgraph Calculation["Partition Calculator"] C1["partitions = ceil(records / batchSize)"] C2["100 partitions for tables"] end subgraph Output["Created Partitions"] P0["Partition 0<br/>entity: table<br/>range: 0-999"] P1["Partition 1<br/>entity: table<br/>range: 1000-1999"] P2["Partition 2<br/>entity: table<br/>range: 2000-2999"] Pn["...<br/>Partition 99<br/>range: 99000-99999"] end Input --> Calculation --> OutputAtomic Partition Claiming
flowchart LR subgraph Server1["Server 1"] S1A["Request partition"] end subgraph Server2["Server 2"] S2A["Request partition"] end subgraph Database["Database"] Q["SELECT ... FOR UPDATE SKIP LOCKED"] P1["Partition 1: PENDING"] P2["Partition 2: PENDING"] P3["Partition 3: PENDING"] end S1A -->|"Claim"| Q S2A -->|"Claim"| Q Q -->|"Lock & Return"| P1 Q -->|"Skip locked, Return"| P2 style P1 fill:#90EE90 style P2 fill:#90EE90SQL Implementation:
Guarantees:
SKIP LOCKEDallows concurrent claims without waitingJob Notification Mechanisms
Redis Pub/Sub (When Redis is configured)
flowchart LR subgraph Publisher["Server 1 (Publisher)"] P1["Job Started"] end subgraph Redis["Redis"] C1["Channel:<br/>om:distributed-jobs:start"] C2["Channel:<br/>om:distributed-jobs:complete"] end subgraph Subscribers["Subscribers"] S2["Server 2"] S3["Server 3"] SN["Server N"] end P1 -->|"PUBLISH"| C1 C1 -->|"~0ms"| S2 C1 -->|"~0ms"| S3 C1 -->|"~0ms"| SNMessage Format:
{jobId}|{jobType}|{sourceServerId}Characteristics:
Database Polling (Fallback)
flowchart TB subgraph Polling["PollingJobNotifier"] direction TB P1["Poll Interval:<br/>IDLE: 30 seconds<br/>ACTIVE: 1 second"] P2["Query: SELECT id FROM search_index_job<br/>WHERE status = 'RUNNING'"] end subgraph Comparison["Polling Overhead Comparison"] Old["Old Design:<br/>1s always = 86,400 queries/day"] New["New Design:<br/>30s idle = 2,880 queries/day<br/>(96% reduction)"] end Polling --> ComparisonNotifier Selection (Factory Pattern)
flowchart TD Start["DistributedJobNotifierFactory.create()"] Check{"Redis configured?<br/>(cacheConfig.provider == redis)"} Redis["Return RedisJobNotifier"] Polling["Return PollingJobNotifier"] Log1["Log: Using redis-pubsub notifier"] Log2["Log: Using database-polling notifier<br/>(30s discovery delay)"] Start --> Check Check -->|"Yes"| Redis --> Log1 Check -->|"No"| Polling --> Log2Database Schema
Entity Relationship Diagram
erDiagram SEARCH_INDEX_JOB ||--o{ SEARCH_INDEX_PARTITION : contains SEARCH_INDEX_JOB { varchar id PK varchar status "PENDING|RUNNING|COMPLETED|FAILED|STOPPED" json job_configuration bigint total_records bigint processed_records bigint success_records bigint failed_records timestamp created_at timestamp started_at timestamp completed_at varchar triggered_by text failure_details } SEARCH_INDEX_PARTITION { varchar id PK varchar job_id FK varchar entity_type int partition_index bigint range_start bigint range_end varchar status "PENDING|PROCESSING|COMPLETED|FAILED" varchar claimed_by timestamp claimed_at bigint processed_count bigint success_count bigint failed_count timestamp started_at timestamp completed_at text last_error int retry_count }Key Indexes
Server Statistics Tracking
flowchart TB subgraph Partitions["Completed Partitions"] P1["Partition 1<br/>claimed_by: server-1<br/>processed: 1000"] P2["Partition 2<br/>claimed_by: server-2<br/>processed: 1000"] P3["Partition 3<br/>claimed_by: server-1<br/>processed: 1000"] P4["Partition 4<br/>claimed_by: server-2<br/>processed: 1000"] end subgraph Aggregation["Stats Aggregation Query"] Q["SELECT claimed_by, COUNT(*), SUM(processed_count)...<br/>GROUP BY claimed_by"] end subgraph Result["Per-Server Stats"] R1["server-1: 2 partitions, 2000 records"] R2["server-2: 2 partitions, 2000 records"] end Partitions --> Aggregation --> ResultUI Display:
Configuration
Application Configuration (openmetadata.yaml)
Search Indexing App Configuration
Performance Characteristics
Scaling Behavior
xychart-beta title "Indexing Time vs Server Count (1M Records)" x-axis ["1 Server", "2 Servers", "4 Servers", "8 Servers"] y-axis "Time (minutes)" 0 --> 25 bar [20, 10, 5, 2.5]Note: Actual times depend on entity complexity, ES/OS cluster performance, and network latency.
When to Use Distributed Indexing
Error Handling & Recovery
Partition Failure Handling
flowchart TD Start["Partition Processing"] Process["Process entities"] Error{"Error occurred?"} Success["Mark COMPLETED<br/>Update stats"] Fail["Mark FAILED<br/>Record error"] Crash["Server Crash"] Stale{"Partition stale?<br/>(claimed_at > threshold)"} Reset["Reset to PENDING"] Reclaim["Another server claims"] Start --> Process --> Error Error -->|"No"| Success Error -->|"Yes"| Fail Crash --> Stale Stale -->|"Yes"| Reset --> Reclaim Stale -->|"No"| Wait["Wait for timeout"]Job-Level Failure Handling
flowchart TD Check["Check partition results"] Threshold{"Failed > threshold?"} Complete["Job: COMPLETED"] CompleteErrors["Job: COMPLETED_WITH_ERRORS"] Failed["Job: FAILED"] Check --> Threshold Threshold -->|"0% failed"| Complete Threshold -->|"< 10% failed"| CompleteErrors Threshold -->|"> 10% failed"| FailedClass Diagram
classDiagram class DistributedJobNotifier { <<interface>> +notifyJobStarted(UUID jobId, String jobType) +notifyJobCompleted(UUID jobId) +onJobStarted(Consumer~UUID~ callback) +start() +stop() +isRunning() boolean +getType() String } class RedisJobNotifier { -CacheConfig.Redis redisConfig -RedisClient redisClient -StatefulRedisPubSubConnection subConnection -StatefulRedisConnection pubConnection +start() +stop() +notifyJobStarted() +notifyJobCompleted() } class PollingJobNotifier { -CollectionDAO collectionDAO -ScheduledExecutorService scheduler -long IDLE_POLL_INTERVAL_MS = 30000 -long ACTIVE_POLL_INTERVAL_MS = 1000 +start() +stop() +setParticipating(boolean) } class DistributedJobNotifierFactory { +create(CacheConfig, CollectionDAO, String)$ DistributedJobNotifier } class DistributedJobParticipant { -CollectionDAO collectionDAO -SearchRepository searchRepository -DistributedSearchIndexCoordinator coordinator -DistributedJobNotifier notifier -AtomicBoolean participating +start() +stop() -onJobDiscovered(UUID jobId) -joinAndProcessJob(SearchIndexJob job) -processJobPartitions(SearchIndexJob job) } class DistributedSearchIndexCoordinator { -CollectionDAO collectionDAO +createJob(config) SearchIndexJob +createPartitions(job, entities) +claimNextPartition(jobId) Optional~Partition~ +updatePartitionStats(partition, stats) +completeJob(jobId) +getServerStats(jobId) List~ServerStats~ } DistributedJobNotifier <|.. RedisJobNotifier DistributedJobNotifier <|.. PollingJobNotifier DistributedJobNotifierFactory ..> DistributedJobNotifier : creates DistributedJobParticipant --> DistributedJobNotifier DistributedJobParticipant --> DistributedSearchIndexCoordinatorFiles Changed
New Files
DistributedJobNotifier.javaRedisJobNotifier.javaPollingJobNotifier.javaDistributedJobNotifierFactory.javaModified Files
DistributedJobParticipant.javaDistributedSearchIndexExecutor.javaOpenMetadataApplication.javaCollectionDAO.javagetRunningJobIds()for lightweight pollingTesting
Future Enhancements