-
Notifications
You must be signed in to change notification settings - Fork 325
MCP server filtering improvements #5654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Converted file retrieval methods in PhabricatorPatch and related interfaces to async, replacing synchronous HTTP calls with httpx.AsyncClient.
Trusted users are now: - Mozilla Corporation employees, OR - Users with editbugs privilege who have been active within last year This prevents dormant accounts with editbugs from being considered trusted.
Refactor metadata filtering into a SanitizedBug class that inherits from Bug and overrides properties to return sanitized values when no trusted user has commented. When there's no trusted comment, the following are redacted: - Bug title/summary - Reporter name and email - Assignee name and email The trust check uses cached_property to avoid redundant API calls. Bug.get() now returns SanitizedBug by default for security.
Refactor metadata filtering into a SanitizedPhabricatorPatch class that inherits from PhabricatorPatch and overrides properties to return sanitized values when no trusted user has commented. When there's no trusted comment, the following are redacted: - Revision title - Author name and email - Summary and test plan - Diff author (if different from revision author) - Stack graph titles The trust check uses cached_property to avoid redundant API calls. PhabricatorReviewData.get_patch_by_id() now returns SanitizedPhabricatorPatch.
This is now the full list from BMO
Automatically trust comments created before 2022 as prompt injection was not a concern at that time. This applies to both comment content validation and metadata redaction policies.
Ensure collapsed/admin tagged comments are excluded from all trust validation logic, not just the timeline output: - Skip when finding last trusted comment - Skip when building trusted user cache - Skip when checking for any trusted comment
Add comprehensive test coverage for: - Admin-tagged comments being completely disregarded - Pre-2022 comments being automatically trusted - All collapsed tags (spam, abuse, nsfw, etc.) filtering These tests validate the security policy changes implemented in previous commits.
suhaibmujahid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! It looks good to me, with a few comments.
| current_phid = revision["phid"] | ||
| patch_map = { | ||
| phid: ( | ||
| self | ||
| if phid == current_phid | ||
| else PhabricatorPatch(revision_phid=phid) | ||
| ) | ||
| for phid in stack_graph.keys() | ||
| } | ||
| current_phid = self._revision_metadata["phid"] | ||
| revision_ids = {} | ||
| for phid in self.stack_graph.keys(): | ||
| if phid == current_phid: | ||
| revision_ids[phid] = self.revision_id | ||
| else: | ||
| revision_ids[phid] = PhabricatorPatch( | ||
| revision_phid=phid | ||
| ).revision_id | ||
|
|
||
| for phid, dependencies in self.stack_graph.items(): | ||
| from_id = f"D{revision_ids[phid]}" | ||
| stack_title = self.get_stack_patch_title(phid) | ||
|
|
||
| for phid, dependencies in stack_graph.items(): | ||
| from_patch = patch_map[phid] | ||
| from_id = f"D{from_patch.revision_id}" | ||
| if phid == current_phid: | ||
| md_lines.append( | ||
| f" {from_id}[{from_patch.patch_title} - CURRENT]" | ||
| ) | ||
| md_lines.append(f" {from_id}[{stack_title} - CURRENT]") | ||
| md_lines.append(f" style {from_id} fill:#105823") | ||
| else: | ||
| md_lines.append(f" {from_id}[{from_patch.patch_title}]") | ||
| md_lines.append(f" {from_id}[{stack_title}]") | ||
|
|
||
| for dep_phid in dependencies: | ||
| dep_id = f"D{patch_map[dep_phid].revision_id}" | ||
| dep_id = f"D{revision_ids[dep_phid]}" | ||
| md_lines.append(f" {from_id} --> {dep_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the goal of the change here. The title should be sanitized automatically by using the patch_title property without any changes to this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change here can be reversed. The information is already sanitized, unless I'm missing something.
| ) | ||
|
|
||
| def user_handler(user, data): | ||
| email = user.get("name", "").lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might trust all the @mozilla.com emails:
if email.endswith("@mozilla.com"):
data[email] = TrueThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided against this because checking emails is very hard (because unicode is hard, etc.), and checking group is fast (it can be batched -- same duration to check one or 30, I measured this against live BMO). I had this in the first version of the first patch iirc but took it out after a quick chat with Luke.
If we decide we want this, we'll want to use the industry standard package to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember Luke's comment on the first PR, but didn't grasp the issue. Why is email validation complex here? What edge cases does email.endswith("@mozilla.com") miss that we need to handle?"
We agreed not to use the last_seen_date activity data since it's not useful for determining trust.
All Mozilla Corporation members are inherently in the editbugs group, so checking both is redundant.
Changed Bug.get from @staticmethod to @classmethod to use cls instead of hardcoding the return type. Modified MCP server to use SanitizedBug directly so sanitization only happens at the API boundary.
The title is already displayed in the page header and is automatically sanitized via the patch_title property.
Modified MCP server to use SanitizedPhabricatorPatch instead of PhabricatorPatch so sanitization only happens at the API boundary.
This method is not used by the MCP server, so sanitization should not be applied here. Sanitization is handled at the MCP server boundary.
The _display suffix is a new pattern not in master. Removing it to keep the changes minimal and avoid introducing new naming conventions.
- Remove mozilla-corporation group check (only editbugs matters) - Remove activity date checks (not useful for trust determination)
suhaibmujahid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. It looks good. I just left a few minor comments.
Splitting the changes into scoped commits made the review much easier. I really appreciated it.
| current_phid = revision["phid"] | ||
| patch_map = { | ||
| phid: ( | ||
| self | ||
| if phid == current_phid | ||
| else PhabricatorPatch(revision_phid=phid) | ||
| ) | ||
| for phid in stack_graph.keys() | ||
| } | ||
| current_phid = self._revision_metadata["phid"] | ||
| revision_ids = {} | ||
| for phid in self.stack_graph.keys(): | ||
| if phid == current_phid: | ||
| revision_ids[phid] = self.revision_id | ||
| else: | ||
| revision_ids[phid] = PhabricatorPatch( | ||
| revision_phid=phid | ||
| ).revision_id | ||
|
|
||
| for phid, dependencies in self.stack_graph.items(): | ||
| from_id = f"D{revision_ids[phid]}" | ||
| stack_title = self.get_stack_patch_title(phid) | ||
|
|
||
| for phid, dependencies in stack_graph.items(): | ||
| from_patch = patch_map[phid] | ||
| from_id = f"D{from_patch.revision_id}" | ||
| if phid == current_phid: | ||
| md_lines.append( | ||
| f" {from_id}[{from_patch.patch_title} - CURRENT]" | ||
| ) | ||
| md_lines.append(f" {from_id}[{stack_title} - CURRENT]") | ||
| md_lines.append(f" style {from_id} fill:#105823") | ||
| else: | ||
| md_lines.append(f" {from_id}[{from_patch.patch_title}]") | ||
| md_lines.append(f" {from_id}[{stack_title}]") | ||
|
|
||
| for dep_phid in dependencies: | ||
| dep_id = f"D{patch_map[dep_phid].revision_id}" | ||
| dep_id = f"D{revision_ids[dep_phid]}" | ||
| md_lines.append(f" {from_id} --> {dep_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change here can be reversed. The information is already sanitized, unless I'm missing something.
| ) | ||
|
|
||
| def user_handler(user, data): | ||
| email = user.get("name", "").lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember Luke's comment on the first PR, but didn't grasp the issue. Why is email validation complex here? What edge cases does email.endswith("@mozilla.com") miss that we need to handle?"
| return self._metadata.get("assigned_to_detail", {}) | ||
|
|
||
| @property | ||
| def reporter(self) -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name reporter is unclear here; is it an email or a name? We could have both in two separate properties.
| return f"- **Reporter**: {name} ({email})" | ||
|
|
||
| @property | ||
| def assignee(self) -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return f"- **Assignee**: {name} ({email})" | ||
|
|
||
| @property | ||
| def timeline_comments(self) -> list[dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def timeline_comments(self) -> list[dict]: | |
| def comments(self) -> list[dict]: |
| return self._metadata.get("comments", []) | ||
|
|
||
| @property | ||
| def timeline_history(self) -> list[dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def timeline_history(self) -> list[dict]: | |
| def history(self) -> list[dict]: |
| "real_name", detail.get("nick", detail.get("email", "Unknown")) | ||
| ) | ||
| email = detail.get("email", "No email") | ||
| return f"- **Reporter**: {name} ({email})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The markdown formatting does not belong here. It should be moved bug_to_markdown().
| "real_name", detail.get("nick", detail.get("email", "Unknown")) | ||
| ) | ||
| email = detail.get("email", "No email") | ||
| return f"- **Assignee**: {name} ({email})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implements the policy as discussed on the 2026-02-05.