Add Human PPI text dataset and test split by suencgo · Pull Request #8 · InternScience/SciEvalKit

suencgo · 2026-02-05T09:56:39Z

No description provided.

gemini-code-assist · 2026-02-05T09:56:55Z

Summary of Changes

Hello @suencgo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new dataset for evaluating models on Human Protein-Protein Interaction (PPI) text classification. It provides the necessary infrastructure to load, prompt, and evaluate models on this specific task, focusing on a straightforward binary classification approach. The changes enable the system to handle a new type of biological text data with a tailored evaluation mechanism.

Highlights

New Dataset Introduction: A new HumanPPIDataset has been added, specifically designed for binary classification of Human Protein-Protein Interaction (PPI) from text.
Custom Evaluation Logic: The dataset includes a specialized evaluation method that parses natural language model outputs (e.g., "Yes" or "No") into binary predictions (0/1) to calculate accuracy, bypassing the need for external judges or tokenizers.
Integration into System: The HumanPPIDataset is now integrated into the TEXT_DATASET registry, making it discoverable and usable within the scieval framework.

Changelog

scieval/dataset/init.py
- Imported HumanPPIDataset to make it available.
- Added HumanPPIDataset to the TEXT_DATASET list for automatic registration.
scieval/dataset/human_ppi.py
- Added a new file defining the HumanPPIDataset class.
- Implemented load_data to read human_ppi_test.tsv from LMUDataRoot().
- Defined build_prompt to use the 'question' field directly.
- Implemented evaluate to use the custom evaluate_human_ppi_binary function.
scieval/dataset/utils/human_ppi.py
- Added a new utility file containing parse_human_ppi_output and evaluate_human_ppi_binary.
- parse_human_ppi_output extracts binary (0/1) predictions from free-form text outputs.
- evaluate_human_ppi_binary calculates accuracy based on these parsed predictions and true labels.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds a new text dataset, Human PPI, for protein-protein interaction prediction, including the dataset class, evaluation logic, and utility functions for parsing model outputs. While the changes are well-structured, two critical security concerns have been identified: an insecure deserialization vulnerability in the evaluation utility that could lead to remote code execution if a malicious prediction file is loaded, and a path traversal vulnerability in the dataset loading logic that could allow reading arbitrary files with a .tsv extension. Additionally, I've provided suggestions to improve code clarity and maintainability by removing wildcard imports, refactoring a parsing function for simplicity, and using more idiomatic pandas operations for data manipulation. Addressing these security vulnerabilities and code suggestions will enhance the overall quality and safety of this addition.

gemini-code-assist · 2026-02-05T09:59:40Z

scieval/dataset/utils/human_ppi.py

+    - 计算整体准确率，并返回一个只有一行的 DataFrame：
+        columns: ['Total', 'Correct', 'Accuracy']
+    """
+    data = load(eval_file)


The evaluate_human_ppi_binary function calls the load utility function with the eval_file argument. The load function (defined in scieval/smp/file.py) determines how to process a file based on its extension. If the file extension is .pkl, it uses pickle.load() to deserialize the content. Furthermore, load supports URLs and will automatically download a file if a URL is provided. An attacker can exploit this by providing a URL to a malicious pickle file as the eval_file, leading to arbitrary code execution on the user's system when the evaluation is run. This is a classic insecure deserialization vulnerability.

To remediate this, avoid using pickle.load() for untrusted input. Use safer serialization formats like JSON or safetensors. If pickle must be used, ensure the file comes from a trusted source and validate its integrity. Additionally, consider disabling automatic URL downloading in the load utility or restricting it to a set of trusted domains.

gemini-code-assist · 2026-02-05T09:59:40Z

scieval/dataset/human_ppi.py

+        不从远程下载，直接从 LMUDataRoot 下读取本地 tsv：
+            <LMUDataRoot>/human_ppi_test.tsv
+        """
+        data_path = osp.join(LMUDataRoot(), f"{dataset}.tsv")


The load_data method in HumanPPIDataset constructs a file path by joining LMUDataRoot() with a user-supplied dataset name and appending .tsv. There is no validation or sanitization of the dataset name to prevent path traversal sequences like ../. An attacker could potentially use this to read arbitrary .tsv files on the system that are outside the intended data directory.

To remediate this, sanitize the dataset input by ensuring it does not contain path traversal characters (e.g., .., /, \). Alternatively, validate that the resulting data_path is still within the expected LMUDataRoot() directory.

gemini-code-assist · 2026-02-05T09:59:40Z

scieval/dataset/human_ppi.py

+
+from .text_base import TextBaseDataset
+from .utils.human_ppi import evaluate_human_ppi_binary
+from ..smp import *


Using wildcard imports (from ... import *) is generally discouraged as it can lead to namespace pollution and make it difficult to trace where functions and classes originate. It's better to explicitly import the names you need. In this case, you seem to be using osp, LMUDataRoot, and load.

Suggested change

from ..smp import *

from ..smp import LMUDataRoot, load, osp

gemini-code-assist · 2026-02-05T09:59:40Z

scieval/dataset/utils/human_ppi.py

+import re
+from typing import Any
+
+from ...smp import *


Using wildcard imports (from ... import *) is generally discouraged as it can lead to namespace pollution and make it difficult to trace where functions and classes originate. It's better to explicitly import the names you need. In this case, you seem to be using load, dump, and pd.

Suggested change

from ...smp import *

from ...smp import load, dump, pd

gemini-code-assist · 2026-02-05T09:59:40Z

scieval/dataset/utils/human_ppi.py

+def parse_human_ppi_output(output_text: str) -> int:
+    """
+    解析模型输出，判断是 Yes(1) / No(0)。
+
+    逻辑基本复用自本地测试脚本 `test_human_ppi.py`：
+    - 优先看开头是否是 Yes./No.
+    - 再看前若干字符中是否出现 yes/no
+    - 再看前 200 字符中是否包含显式 Yes/No 词
+    - 如果文本里出现 interaction/binding 等肯定词，再结合否定词判断
+
+    返回：
+        1 表示预测为 Yes（存在 PPI）
+        0 表示预测为 No（不存在 PPI，或无法确定）
+    """
+    output_text = (output_text or "").strip()
+    if not output_text:
+        return 0
+
+    # 1) 直接看开头
+    if re.match(r"^(Yes|yes|YES)[\s.,;:!?]", output_text):
+        return 1
+    if re.match(r"^(No|no|NO)[\s.,;:!?]", output_text):
+        return 0
+
+    # 2) 看前若干字符
+    first_words = output_text[:50].lower()
+    if first_words.startswith("yes"):
+        return 1
+    if first_words.startswith("no"):
+        return 0
+
+    # 3) 在前 200 字符中查找显式 Yes/No
+    head = output_text[:200]
+    if re.search(r"\b(No|no|NO)\b", head):
+        return 0
+    if re.search(r"\b(Yes|yes|YES)\b", head):
+        return 1
+
+    # 4) 检查 interaction/binding 等相关表述
+    if re.search(r"\b(interaction|binding|interact|bind|will interact|can interact)\b", output_text, re.IGNORECASE):
+        # 若包含明显否定词，则判为 0
+        if re.search(
+            r"\b(not|no|unlikely|cannot|doesn\'t|don\'t|won\'t|no interaction|unlikely to)\b",
+            output_text,
+            re.IGNORECASE,
+        ):
+            return 0
+        return 1
+
+    # 无法判断时，默认 No
+    return 0


The logic for parsing the model's text output is a bit complex and can be simplified for better readability and robustness. Using re.IGNORECASE can simplify the regular expressions, and combining the initial checks can reduce redundancy. This refactoring preserves the original logic while making it cleaner.

def parse_human_ppi_output(output_text: str) -> int: """ 解析模型输出，判断是 Yes(1) / No(0)。逻辑基本复用自本地测试脚本 `test_human_ppi.py`： - 优先看开头是否是 Yes./No. - 再看前若干字符中是否出现 yes/no - 再看前 200 字符中是否包含显式 Yes/No 词 - 如果文本里出现 interaction/binding 等肯定词，再结合否定词判断返回： 1 表示预测为 Yes（存在 PPI） 0 表示预测为 No（不存在 PPI，或无法确定） """ output_text = (output_text or "").strip() if not output_text: return 0 # 1) Check for "yes" or "no" at the beginning of the string, using word boundaries. if re.match(r"^yes\\b", output_text, re.IGNORECASE): return 1 if re.match(r"^no\\b", output_text, re.IGNORECASE): return 0 # 2) Search for explicit Yes/No words in the first 200 characters. # The order is important: 'no' is checked first. head = output_text[:200] if re.search(r"\\bno\\b", head, re.IGNORECASE): return 0 if re.search(r"\\byes\\b", head, re.IGNORECASE): return 1 # 3) Check for interaction/binding related expressions. if re.search(r"\\b(interaction|binding|interact|bind|will interact|can interact)\\b", output_text, re.IGNORECASE): # If it contains obvious negative words, it's a 0. if re.search( r"\\b(not|no|unlikely|cannot|doesn't|don't|won't|no interaction|unlikely to)\\b", output_text, re.IGNORECASE, ): return 0 return 1 # Default to No if unable to determine. return 0

gemini-code-assist · 2026-02-05T09:59:40Z

scieval/dataset/utils/human_ppi.py

+    # 只保留在 meta 里出现的索引
+    data = data[data["index"].isin(meta["index"])]
+
+    # 对 prediction 做解析
+    preds = [parse_human_ppi_output(str(x)) for x in data["prediction"]]
+    data["pred_label"] = preds
+
+    # 合并真实标签
+    idx2label = {int(i): int(c) for i, c in zip(meta["index"], meta["category"])}
+    data["true_label"] = [idx2label[int(i)] for i in data["index"]]
+
+    data["correct"] = data["pred_label"] == data["true_label"]


The current implementation for merging true labels is correct, but it can be made more concise and efficient by using pandas.merge. This is more idiomatic for pandas operations and can be more performant on larger datasets.

# 只保留在 meta 里出现的索引, 并合并真实标签 data = pd.merge(data, meta_df[['index', 'category']], on='index', how='inner') data.rename(columns={'category': 'true_label'}, inplace=True) # 对 prediction 做解析 data['pred_label'] = [parse_human_ppi_output(str(x)) for x in data['prediction']] data['correct'] = data['pred_label'] == data['true_label']

Add Human PPI text dataset and test split

61291c4

gemini-code-assist bot reviewed Feb 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Human PPI text dataset and test split#8

Add Human PPI text dataset and test split#8
suencgo wants to merge 1 commit intoInternScience:mainfrom
suencgo:feature-human-ppi

suencgo commented Feb 5, 2026

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

suencgo commented Feb 5, 2026

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant