-
Notifications
You must be signed in to change notification settings - Fork 850
Fix: Storing built-in feature bins in program + Fix: Using llm_feedback_weight in final score #401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c742512
3e9c4cc
c53c195
74f7b0f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -861,6 +861,7 @@ def _calculate_feature_coords(self, program: Program) -> List[int]: | |
| # Use code length as complexity measure | ||
| complexity = len(program.code) | ||
| bin_idx = self._calculate_complexity_bin(complexity) | ||
| program.complexity = bin_idx # Store complexity bin in program | ||
| coords.append(bin_idx) | ||
| elif dim == "diversity": | ||
| # Use cached diversity calculation with reference set | ||
|
|
@@ -869,6 +870,7 @@ def _calculate_feature_coords(self, program: Program) -> List[int]: | |
| else: | ||
| diversity = self._get_cached_diversity(program) | ||
| bin_idx = self._calculate_diversity_bin(diversity) | ||
| program.diversity = bin_idx # Store diversity bin in program | ||
| coords.append(bin_idx) | ||
|
Comment on lines
867
to
874
|
||
| elif dim == "score": | ||
| # Use average of numeric metrics | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -208,9 +208,10 @@ async def evaluate_program( | |
| if "combined_score" in eval_result.metrics: | ||
| # Original combined_score is just accuracy | ||
| accuracy = eval_result.metrics["combined_score"] | ||
| # Combine with LLM average (70% accuracy, 30% LLM quality) | ||
| # Combine accuracy with LLM average using dynamic weighting: | ||
| # (1 - llm_feedback_weight) * accuracy + llm_feedback_weight * LLM quality | ||
| eval_result.metrics["combined_score"] = ( | ||
| accuracy * 0.7 + llm_average * 0.3 | ||
| accuracy * (1-self.config.llm_feedback_weight) + llm_average * self.config.llm_feedback_weight | ||
| ) | ||
|
Comment on lines
213
to
215
|
||
|
|
||
| # Store artifacts if enabled and present | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning the bin index into Program.complexity/Program.diversity is semantically ambiguous (the dataclass defines these as derived feature values, currently typed as float). Consider either casting to float for consistency, or introducing explicit fields like complexity_bin/diversity_bin to avoid confusing bins with raw feature values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback