Skip to content

Conversation

@allisoneer
Copy link
Contributor

@allisoneer allisoneer commented May 19, 2025

Issues and Status

ENG-1217: 4xx Error Detection (In Progress)

  • Problem: Task controller treats HTTP 4xx errors as terminal failures; should retry with backoff
  • Current behavior:
    • HTTP 4xx errors → terminal "Failed" state → no retries
  • Fix: Make all HTTP errors retryable, eliminate terminal state for these errors
  • Progress:
    • ✅ Test cases created in task_controller_error_test.go
    • ⏳ Need to implement changes to error handling logic
    • ✅ Improved error detection in LangchainGo client to extract status codes

ENG-1304: Double-sending to LLM (Fixed)

  • Problem: Multiple LLM requests sent due to race condition
  • Root cause: Task transitions to ReadyForLLM then immediately requeues
  • Solution: Add intermediate state to ensure one-time sending
  • Progress:
    • ✅ Added SendContextWindowToLLM intermediate state
    • ✅ Updated controller logic to handle the new state transition
    • ✅ Updated tests to verify the fix

ENG-1233: Tool Result Collection (Fixed)

  • Problem: Controller doesn't validate tool call results properly
  • Issues: Missing/empty results still considered "complete"
  • Solution: Add result validation, handle edge cases
  • Progress:
    • ✅ Added validation for empty tool call results
    • ✅ Added counter for missing results with requeue logic
    • ✅ Updated status messages and logging

ENG-1299: Tool Use Ordering (Fixed)

  • Problem: Tool calls and results not properly ordered for Anthropic
  • Solution: Ensure each tool_use is immediately followed by its tool_result
  • Progress:
    • ✅ Implemented proper ordering of tool calls and results
    • ✅ Added sorting of tool calls to match assistant message order
    • ✅ Created a new context window with interleaved assistant/tool messages

ENG-1177: MCP Server Test Failures (Fixed)

  • Problem: Resource leaks and timeouts in MCP server connections
  • Solution: Improved resource management and timeouts
  • Progress:
    • ✅ Added timeouts for MCP server connections and calls
    • ✅ Improved cleanup of resources in error cases
    • ✅ Enhanced connection tracking and error handling

Next Steps

  1. Complete HTTP error handling fix (the only remaining item)
  2. Run full test suite to ensure all issues are resolved
  3. Document changes for the team

Challenges

  • Existing tests assume 4xx errors are terminal failures
  • HTTP error handling requires changes to core controller logic

@allisoneer
Copy link
Contributor Author

@claude What do you think about this?

@allisoneer
Copy link
Contributor Author

@claude What is going on here? Is this good stuff?

@github-actions
Copy link

github-actions bot commented May 20, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

@balanceiskey
Copy link
Contributor

@claude What's going on here?

@github-actions
Copy link

github-actions bot commented May 20, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

@balanceiskey
Copy link
Contributor

@claude Come up with 6 reasons why this code is DOPE.

@github-actions
Copy link

github-actions bot commented May 20, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

@allisoneer
Copy link
Contributor Author

@claude How does this look?

@allisoneer
Copy link
Contributor Author

@claude how does this look?

@github-actions
Copy link

github-actions bot commented May 20, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

@allisoneer
Copy link
Contributor Author

@claude check out this PR

@claude
Copy link

claude bot commented May 20, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

// Check if all tool calls are completed
// Check if all tool calls are completed and have valid results
allCompleted := true
allHaveValidResults := true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand how allHaveValidResults is different from !allCompleted

}

// Additionally check for empty or missing results
if tc.Status.Result == "" {
Copy link
Contributor

@dexhorthy dexhorthy May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we got here that sounds like a bug in the toolcall controller?? We can be defensive here but what's the underlying cause of this? How did we get a TC in succeeded or Error but with no result?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants