Skip to content

Add optional country field to run and EDLA schemas#96

Closed
Copilot wants to merge 2 commits intomasterfrom
copilot/add-country-field-to-schemas
Closed

Add optional country field to run and EDLA schemas#96
Copilot wants to merge 2 commits intomasterfrom
copilot/add-country-field-to-schemas

Conversation

Copy link
Contributor

Copilot AI commented Dec 9, 2025

Release Notes:

  • Optional country field added into EDLA schemas

Adds an optional country field to both run and EDLA (dlchange) schemas to support regional data tracking.

Schema Changes

  • topic_runs.json: Added country field to jobs.items.properties (positioned after catalog_id)
  • topic_dlchange.json: Added country field to root properties (positioned after timestamp_event)

Both fields are optional string types with no default constraint in the schema definitions.

Database Writer Updates

Updated PostgreSQL insert operations to include the new column:

# EDLA writes - country inserted after timestamp_event
message.get("country", "")

# Run job writes - country inserted after catalog_id  
job.get("country", "")

Default value of empty string when field is absent maintains backward compatibility with existing events.

Test Updates

Modified test assertions to validate the additional parameter position in SQL insert statements. Tests verify both presence of the field and correct default empty string behavior.

Original prompt

Add country field to run and EDLA schemas

Background

There is a new requirement to add a country field to both the run and EDLA schemas. This field should be optional with an empty string as the default value.

Changes Required

1. Update conf/topic_runs.json

Add the country field inside the jobs array items properties (between existing properties, following the pattern in the schema):

"country": {
    "type": "string",
    "description": "The country the data is related to."
}

Important:

  • Add this field in the jobs.items.properties object
  • Do NOT add country to the jobs.items.required array (it should remain optional)
  • Place it logically after catalog_id property

2. Update conf/topic_dlchange.json

Add the country field in the main properties object:

"country": {
    "type": "string",
    "description": "The country the data is related to."
}

Important:

  • Add this field in the root properties object
  • Do NOT add country to the required array (it should remain optional)
  • Place it logically after timestamp_event property

3. Update src/writers/writer_postgres.py

In the postgres_edla_write function:

  • Add country column to the INSERT statement (after timestamp_event, before catalog_id)
  • Add the corresponding value in the VALUES tuple using: message.get("country", "")
  • Maintain proper SQL formatting and alignment

In the postgres_run_write function:

  • Add country column to the INSERT statement for the jobs table (after catalog_id, before status)
  • Add the corresponding value when inserting job data using: job.get("country", "")
  • Maintain proper SQL formatting and alignment

4. Coding Standards

  • Follow the existing code style and patterns in the project
  • Ensure proper indentation and formatting (Black-compatible)
  • Use double quotes for strings consistently
  • Maintain proper SQL formatting with aligned columns
  • Ensure all changes pass: black, pylint, pytest, and mypy
  • Follow the exact same pattern as other optional fields (like location, format_options, additional_info)

5. Key Requirements

  • The country field must be OPTIONAL - use .get("country", "") with empty string as default
  • Do NOT add to required fields in JSON schemas
  • Keep the same formatting and structure as existing fields
  • Follow the exact SQL column order and formatting style used in the existing code
  • Check the BlobSha values:
    • conf/topic_runs.json BlobSha: 6239cd873d8e995768b8b8ef4e80fa745a5b460c
    • conf/topic_dlchange.json BlobSha: 7cd7c225994b24154b3e40d096a6c38a4656f77a
    • src/writers/writer_postgres.py BlobSha: ecce37761cdace5b3b8479b1250d9c76b2fe2ed4

Reference

This implements the schema definitions from Issue #79.

This pull request was created as a result of the following prompt from Copilot chat.

Add country field to run and EDLA schemas

Background

There is a new requirement to add a country field to both the run and EDLA schemas. This field should be optional with an empty string as the default value.

Changes Required

1. Update conf/topic_runs.json

Add the country field inside the jobs array items properties (between existing properties, following the pattern in the schema):

"country": {
    "type": "string",
    "description": "The country the data is related to."
}

Important:

  • Add this field in the jobs.items.properties object
  • Do NOT add country to the jobs.items.required array (it should remain optional)
  • Place it logically after catalog_id property

2. Update conf/topic_dlchange.json

Add the country field in the main properties object:

"country": {
    "type": "string",
    "description": "The country the data is related to."
}

Important:

  • Add this field in the root properties object
  • Do NOT add country to the required array (it should remain optional)
  • Place it logically after timestamp_event property

3. Update src/writers/writer_postgres.py

In the postgres_edla_write function:

  • Add country column to the INSERT statement (after timestamp_event, before catalog_id)
  • Add the corresponding value in the VALUES tuple using: message.get("country", "")
  • Maintain proper SQL formatting and alignment

In the postgres_run_write function:

  • Add country column to the INSERT statement for the jobs table (after catalog_id, before status)
  • Add the corresponding value when inserting job data using: job.get("country", "")
  • Maintain proper SQL formatting and alignment

4. Coding Standards

  • Follow the existing code style and patterns in the project
  • Ensure proper indentation and formatting (Black-compatible)
  • Use double quotes for strings consistently
  • Maintain proper SQL formatting with aligned columns
  • Ensure all changes pass: black, pylint, pytest, and mypy
  • Follow the exact same pattern as other optional fields (like location, format_options, additional_info)

5. Key Requirements

  • The country field must be OPTIONAL - use .get("country", "") with empty string as default
  • Do NOT add to required fields in JSON schemas
  • Keep the same formatting and structure as existing fields
  • Follow the exact SQL column order and formatting style used in the existing code
  • Check the BlobSha values:
    • conf/topic_runs.json BlobSha: 6239cd873d8e995768b8b8ef4e80fa745a5b460c
    • conf/topic_dlchange.json BlobSha: 7cd7c225994b24154b3e40d096a6c38a4656f77a
    • src/writers/writer_postgres.py BlobSha: ecce37761cdace5b3b8479b1250d9c76b2fe2ed4

Reference

This implements the schema definitions from Issue #79.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Summary by CodeRabbit

  • New Features
    • Added country field to data schemas to track geographic information associated with recorded events and job runs.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Walkthrough

A new optional "country" field (string type) is added to topic schemas (dlchange and runs), with corresponding updates to the PostgreSQL writer to handle the field during database inserts, and test expectations adjusted to account for the additional parameter.

Changes

Cohort / File(s) Summary
Configuration Schema Updates
conf/topic_dlchange.json, conf/topic_runs.json
Added new optional string property "country" with description "The country the data is related to." in both topic schemas—in dlchange within top-level properties, and in runs within jobs.items.properties.
PostgreSQL Writer Implementation
src/writers/writer_postgres.py
Extended edla\_write and run jobs table insertion logic to include the country field; country values extracted via .get("country","") with default empty string fallback.
Test Updates
tests/writers/test_writer_postgres.py
Adjusted test assertions to reflect addition of country parameter; parameter lists increased from 12 to 13 elements, with country at index 6 in edla\_write tests and index 2 in run\_write tests; subsequent field indices shifted accordingly.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Straightforward, homogeneous changes: single optional field added consistently across configurations, implementation, and tests
  • Parameter index shifts in tests follow a predictable, repetitive pattern
  • No logic complexity or control flow changes; default empty-string handling is standard

Poem

🐰 A country field hops into place,
From schema to Postgres it takes its space,
With indices shifted and defaults so keen,
The sweetest refactor we've ever seen! 🌍

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding an optional country field to the run and EDLA schemas, which is clearly reflected across all modified files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch copilot/add-country-field-to-schemas

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- Added country field to conf/topic_runs.json (jobs.items.properties)
- Added country field to conf/topic_dlchange.json (root properties)
- Updated postgres_edla_write to include country column (after timestamp_event)
- Updated postgres_run_write to include country column in jobs table (after catalog_id)
- Updated tests to validate new country field with empty string default
- All tests passing, black/pylint/mypy checks passing

Co-authored-by: tmikula-dev <72911271+tmikula-dev@users.noreply.github.com>
Copilot AI changed the title [WIP] Add country field to run and EDLA schemas Add optional country field to run and EDLA schemas Dec 9, 2025
Copilot AI requested a review from tmikula-dev December 9, 2025 10:36
@github-actions
Copy link

github-actions bot commented Dec 9, 2025

Trivy has completed a full security repository scan ✅ You can find the analysis results for this PR branch on this overview.
Below is the summary of the findings:

TRIVY CRITICAL HIGH MEDIUM LOW TOTAL
vulnerability 0 0 0 0 0
secret 0 6 0 0 6
misconfiguration 0 0 1 10 11
license 0 0 0 0 0
➡️ Total 0 6 1 10 17

@tmikula-dev
Copy link
Collaborator

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@tmikula-dev tmikula-dev linked an issue Dec 9, 2025 that may be closed by this pull request
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/writers/test_writer_postgres.py (2)

41-66: Consider adding test coverage for country value insertion.

The test verifies the default empty string behavior when country is absent, but there's no test that provides an actual country value (e.g., "US" or "ZA") to verify it's correctly inserted.

Add a test case with country provided:

def test_postgres_edla_write_with_country():
    cur = MockCursor()
    message = {
        "event_id": "e3",
        "tenant_id": "t3",
        "source_app": "app",
        "source_app_version": "1.0.0",
        "environment": "dev",
        "timestamp_event": 333,
        "country": "ZA",
        "catalog_id": "db.tbl",
        "operation": "append",
        "format": "parquet",
    }
    writer_postgres.postgres_edla_write(cur, "table_a", message)
    _sql, params = cur.executions[0]
    assert params[6] == "ZA"  # country value provided

91-122: Test coverage: Consider verifying country value insertion for jobs.

Similar to the dlchange test, this verifies the default empty string but doesn't test when country is actually provided in the jobs array.

Consider adding a job with country value:

"jobs": [
    {
        "catalog_id": "c1",
        "country": "ZA",
        "status": "succeeded",
        "timestamp_start": 1100,
        "timestamp_end": 1200
    },
]

Then verify assert job1_params[2] == "ZA".

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b0922f and 497a1f6.

📒 Files selected for processing (4)
  • conf/topic_dlchange.json (1 hunks)
  • conf/topic_runs.json (1 hunks)
  • src/writers/writer_postgres.py (5 hunks)
  • tests/writers/test_writer_postgres.py (3 hunks)
🔇 Additional comments (5)
conf/topic_runs.json (1)

46-49: LGTM! Optional country field added correctly.

The country field is properly defined as an optional string with a clear description. It's correctly excluded from the required array, maintaining backward compatibility.

conf/topic_dlchange.json (1)

28-31: LGTM! Schema consistency maintained.

The country field definition matches the one in topic_runs.json, ensuring consistency across schemas. Proper placement and optional nature maintained.

src/writers/writer_postgres.py (2)

187-222: Verify jobs table schema includes country column.

The jobs table INSERT now includes the country column. Ensure that the public_cps_za_runs_jobs table has been updated with the country column, otherwise these inserts will fail.


89-138: Ensure the target database table schema includes the country column before deployment.

The INSERT statement references a country column (line 99). Since this repository does not manage database migrations, verify that the target public_cps_za_dlchange table has been updated with the country column via your database migration tooling, otherwise inserts will fail with a column-not-found error.

tests/writers/test_writer_postgres.py (1)

69-88: LGTM! Test assertions updated correctly.

The parameter indices are correctly adjusted to account for the country field at position 6, and the test properly verifies the empty string default.

@tmikula-dev
Copy link
Collaborator

@copilot Why are changes made only for writer_postgres.py? There are other two writers. What were your decisions to implement it only somewhere?

Copy link
Contributor Author

Copilot AI commented Dec 9, 2025

@copilot Why are changes made only for writer_postgres.py? There are other two writers. What were your decisions to implement it only somewhere?

The EventBridge and Kafka writers serialize the entire message object as JSON, so they automatically include the country field without code changes:

  • writer_eventbridge.py (line 82): "Detail": json.dumps(message) - sends complete message
  • writer_kafka.py (line 108): value=json.dumps(message).encode("utf-8") - sends complete message

Only writer_postgres.py requires explicit column definitions in INSERT statements, which is why it needed updates to add the country column.

@tmikula-dev
Copy link
Collaborator

This PR is closed without a merge. It was created for learning purposes with agentic copilot mode. The solution have not satisfied the standard and will be created separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants