Skip to content

feat: observe agents talk #8092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed

feat: observe agents talk #8092

wants to merge 13 commits into from

Conversation

mikeldking
Copy link
Contributor

@mikeldking mikeldking commented Jun 13, 2025

Summary by Sourcery

Introduce a comprehensive "Observe Agents" demo under examples/observe_agents showcasing a multi-agent personal assistant driven by OpenAI Agents and fully instrumented with Arize Phoenix for observability and evaluation.

New Features:

  • Add a Jupyter notebook tutorial demonstrating a four-phase scientific agent evaluation workflow with Arize Phoenix
  • Implement Calendar Agent with Google Calendar tools and Phoenix tracing
  • Implement Mail Agent with Gmail tools, event extraction, and Phoenix tracing
  • Create Coordinator Agent to orchestrate workflows between calendar and mail agents
  • Provide a Streamlit frontend for interactive chat and user feedback logging to Phoenix
  • Include utility scripts for generating OAuth tokens, setting up Phoenix prompts, and evaluating experiments
  • Add example configuration, requirements, and .env files for easy setup

Documentation:

  • Add README with setup instructions and project overview
  • Include .gitignore and .env.example for environment configuration

2. **Set env vars**
```bash
export GOOGLE_SERVICE_ACCOUNT_JSON=/path/to/creds.json
export GMAIL_TOKEN_JSON=/path/to/gmail_token.json
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I was able to figure out what these credentials were @Jgilhuly ...


3. **Run agents (each in its own terminal)**
```bash
pnpm --filter calendar_agent dev # Mastra calendar agent with OTEL tracing
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's no pnpm workspace at the root I don't think this works.

export GMAIL_TOKEN_JSON=/path/to/gmail_token.json
export OPENAI_API_KEY=sk-...
```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need instructions on how to start phoenix

// ----------------------------------------------------------------------------

const server = new MCPServer({
name: "Calendar Agent",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a bit silly to call an agent to be honest. I sorta prefer to be honest about this just being tools

tracer_provider = register(
auto_instrument=True,
endpoint="http://localhost:4317",
verbose=False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turned off logging.

Comment on lines 75 to 79
instructions=(
"You are a scheduling assistant. Use the calendar tools to find availability "
"and create events. When an email needs to be sent, *handoff* to the Mail "
"Agent. Respond with a confirmation once all steps are complete."
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this set of instructions unfortunately cannot use our prompts. How do we plan on supporting this?

@Jgilhuly Jgilhuly linked an issue Jun 20, 2025 that may be closed by this pull request
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Jgilhuly Jgilhuly marked this pull request as ready for review June 27, 2025 15:48
@Jgilhuly Jgilhuly requested review from a team as code owners June 27, 2025 15:48
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 27, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mikeldking - I've reviewed your changes - here's some feedback:

Blocking issues:

  • Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
  • Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)

General comments:

  • The PR description mentions a TypeScript Calendar Agent, but the diff only includes a Python calendar_agent.py—please add the TS implementation or update the docs to match.
  • There’s a typo in workflow.ipynb (!uv pip install -r requirements.txt); it should be something like !pip install -r requirements.txt.
  • You’re repeating Phoenix client and prompt‐loading logic in each agent—consider centralizing that into a shared utility to reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The PR description mentions a TypeScript Calendar Agent, but the diff only includes a Python calendar_agent.py—please add the TS implementation or update the docs to match.
- There’s a typo in workflow.ipynb (`!uv pip install -r requirements.txt`); it should be something like `!pip install -r requirements.txt`.
- You’re repeating Phoenix client and prompt‐loading logic in each agent—consider centralizing that into a shared utility to reduce duplication.

## Individual Comments

### Comment 1
<location> `examples/observe_agents/front_end.py:79` </location>
<code_context>
+    return await Runner.run(agent, prompt)
+
+
+def run_agent_sync(agent, prompt):
+    """Run the agent synchronously by creating a new event loop."""
+    try:
</code_context>

<issue_to_address>
Potential for event loop conflicts in run_agent_sync.

Using nest_asyncio with Streamlit's event loop may cause deadlocks or UI issues. Consider running the agent in a separate thread or process to prevent conflicts.
</issue_to_address>

### Comment 2
<location> `examples/observe_agents/front_end.py:385` </location>
<code_context>
+# CLI entry point
+# ---------------------------------------------------------------------------
+
+if __name__ == "__main__":
+    print(
+        "📅 Calendar Agent (openai-agents) ready. Type your calendar-related request ('exit' to quit)."
</code_context>

<issue_to_address>
Launching Streamlit from within the script can cause recursion.

Launching Streamlit from within the script may cause recursive execution. Recommend removing this block and instructing users to use 'streamlit run' instead.
</issue_to_address>

### Comment 3
<location> `examples/observe_agents/utils/generate_google_token.py:58` </location>
<code_context>
+    creds = flow.run_local_server(port=0)
+
+    # Save the token
+    token_path = Path("../keys/gmail_token.json")
+
+    # Convert credentials to dictionary format
</code_context>

<issue_to_address>
Token path is hardcoded and may not exist.

Consider making the output path configurable or ensuring the directory exists and is writable before saving the token.
</issue_to_address>

### Comment 4
<location> `examples/observe_agents/workflow.ipynb:281` </location>
<code_context>
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "expamples_df = combined_df[\n",
+    "    [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+    "].head()\n",
+    "expamples_df = expamples_df[:3]"
+   ]
+  },
</code_context>

<issue_to_address>
Typo in variable name 'expamples_df'.

Please correct the variable name to 'examples_df' to avoid confusion or potential reference errors.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "expamples_df = combined_df[\n",
+    "    [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+    "].head()\n",
+    "expamples_df = expamples_df[:3]"
+   ]
+  },
=======
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "examples_df = combined_df[\n",
+    "    [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+    "].head()\n",
+    "examples_df = examples_df[:3]"
+   ]
+  },
>>>>>>> REPLACE

</suggested_fix>

## Security Issues

### Issue 1
<location> `examples/observe_agents/front_end.py:398` </location>

<issue_to_address>
**security (opengrep-rules.python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

### Issue 2
<location> `examples/observe_agents/front_end.py:401` </location>

<issue_to_address>
**security (opengrep-rules.python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

return await Runner.run(agent, prompt)


def run_agent_sync(agent, prompt):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Potential for event loop conflicts in run_agent_sync.

Using nest_asyncio with Streamlit's event loop may cause deadlocks or UI issues. Consider running the agent in a separate thread or process to prevent conflicts.

else:
st.info("⚠️ No span ID captured for feedback on this message")

if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Launching Streamlit from within the script can cause recursion.

Launching Streamlit from within the script may cause recursive execution. Recommend removing this block and instructing users to use 'streamlit run' instead.

creds = flow.run_local_server(port=0)

# Save the token
token_path = Path("../keys/gmail_token.json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Token path is hardcoded and may not exist.

Consider making the output path configurable or ensuring the directory exists and is writable before saving the token.

Comment on lines 278 to 286
"metadata": {},
"outputs": [],
"source": [
"expamples_df = combined_df[\n",
" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
"].head()\n",
"expamples_df = expamples_df[:3]"
]
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (typo): Typo in variable name 'expamples_df'.

Please correct the variable name to 'examples_df' to avoid confusion or potential reference errors.

Suggested change
"metadata": {},
"outputs": [],
"source": [
"expamples_df = combined_df[\n",
" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
"].head()\n",
"expamples_df = expamples_df[:3]"
]
},
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "examples_df = combined_df[\n",
+ " [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+ "].head()\n",
+ "examples_df = examples_df[:3]"
+ ]
+ },

pass
else:
# Not running under Streamlit, launch it
subprocess.run([sys.executable, "-m", "streamlit", "run", __file__])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (opengrep-rules.python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

if st.button(
"👍 Good", key=f"good_{message_index}", help="Response was helpful and accurate"
):
success = log_feedback_to_phoenix(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Use named expression to simplify assignment and conditional [×4] (use-named-expression)

st.chat_message("assistant").markdown(msg["content"])

# Add feedback UI for assistant messages
span_id = msg.get("span_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): We've found these issues:

Comment on lines +170 to +172
"body": body[:200] + "..."
if len(body) > 200
else body, # Truncate long messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)

Suggested change
"body": body[:200] + "..."
if len(body) > 200
else body, # Truncate long messages
"body": f"{body[:200]}..." if len(body) > 200 else body,

Comment on lines +36 to +38
differences = []
differences.append("❌ Output does not match expected:")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Merge append into list declaration (merge-list-append)

Suggested change
differences = []
differences.append("❌ Output does not match expected:")
differences = ["❌ Output does not match expected:"]

Comment on lines +72 to +76
similarity = 1.0 # Both strings are empty, perfect match
else:
distance = levenshtein(output_str, expected_str)
similarity = 1.0 - (distance / max_len)
return similarity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): We've found these issues:

Suggested change
similarity = 1.0 # Both strings are empty, perfect match
else:
distance = levenshtein(output_str, expected_str)
similarity = 1.0 - (distance / max_len)
return similarity
return 1.0
distance = levenshtein(output_str, expected_str)
return 1.0 - (distance / max_len)

@github-project-automation github-project-automation bot moved this from 📘 Todo to 🔍. Needs Review in phoenix Jun 27, 2025
Comment on lines 281 to 284
"expamples_df = combined_df[\n",
" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
"].head()\n",
"expamples_df = expamples_df[:3]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name expamples_df appears to be misspelled. It should be examples_df for consistency and clarity throughout the codebase. This would make the code more maintainable and easier to understand for other developers.

Suggested change
"expamples_df = combined_df[\n",
" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
"].head()\n",
"expamples_df = expamples_df[:3]"
"examples_df = combined_df[\n",
" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
"].head()\n",
"examples_df = examples_df[:3]"

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.



if __name__ == "__main__":
run_evaluation_for_experiment(["RXhwZXJpbWVudDo1NA=="])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded experiment ID RXhwZXJpbWVudDo1NA== appears to be instance-specific and won't work across different installations. This will cause the script to fail when users run it in their own environments. Consider replacing this with a placeholder or adding instructions for users to insert their own experiment IDs generated during the workflow.

Suggested change
run_evaluation_for_experiment(["RXhwZXJpbWVudDo1NA=="])
# Replace the experiment ID below with your own experiment ID generated during the workflow
# Example: run_evaluation_for_experiment(["YOUR_EXPERIMENT_ID_HERE"])
run_evaluation_for_experiment([])

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

creds = flow.run_local_server(port=0)

# Save the token
token_path = Path("../keys/gmail_token.json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code assumes the ../keys directory exists, but it might not be present when the script runs. Consider adding directory creation before writing the token file:

token_path.parent.mkdir(parents=True, exist_ok=True)

This ensures the parent directory exists before attempting to save the token, preventing potential file write errors.

Suggested change
token_path = Path("../keys/gmail_token.json")
token_path = Path("../keys/gmail_token.json")
token_path.parent.mkdir(parents=True, exist_ok=True)

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

JohnGilhuly and others added 3 commits June 27, 2025 08:52
Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
@mikeldking mikeldking closed this Jul 16, 2025
@github-project-automation github-project-automation bot moved this from 🔍. Needs Review to ✅ Done in phoenix Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Observe talk - self-improving oss repo
2 participants