feat: observe agents talk #8092

mikeldking · 2025-06-13T22:39:18Z

Summary by Sourcery

Introduce a comprehensive "Observe Agents" demo under examples/observe_agents showcasing a multi-agent personal assistant driven by OpenAI Agents and fully instrumented with Arize Phoenix for observability and evaluation.

New Features:

Add a Jupyter notebook tutorial demonstrating a four-phase scientific agent evaluation workflow with Arize Phoenix
Implement Calendar Agent with Google Calendar tools and Phoenix tracing
Implement Mail Agent with Gmail tools, event extraction, and Phoenix tracing
Create Coordinator Agent to orchestrate workflows between calendar and mail agents
Provide a Streamlit frontend for interactive chat and user feedback logging to Phoenix
Include utility scripts for generating OAuth tokens, setting up Phoenix prompts, and evaluating experiments
Add example configuration, requirements, and .env files for easy setup

Documentation:

Add README with setup instructions and project overview
Include .gitignore and .env.example for environment configuration

mikeldking · 2025-06-19T23:27:50Z

examples/observe_agents/README.md

+2. **Set env vars**
+   ```bash
+   export GOOGLE_SERVICE_ACCOUNT_JSON=/path/to/creds.json
+   export GMAIL_TOKEN_JSON=/path/to/gmail_token.json


I'm not sure I was able to figure out what these credentials were @Jgilhuly ...

examples/observe_agents/.gitignore

mikeldking · 2025-06-19T23:28:29Z

examples/observe_agents/README.md

+
+3. **Run agents (each in its own terminal)**
+   ```bash
+   pnpm --filter calendar_agent dev          # Mastra calendar agent with OTEL tracing


if there's no pnpm workspace at the root I don't think this works.

mikeldking · 2025-06-19T23:29:30Z

examples/observe_agents/README.md

+   export GMAIL_TOKEN_JSON=/path/to/gmail_token.json
+   export OPENAI_API_KEY=sk-...
+   ```
+


Probably need instructions on how to start phoenix

mikeldking · 2025-06-19T23:30:43Z

examples/observe_agents/calendar_agent/src/index.ts

+// ----------------------------------------------------------------------------
+
+const server = new MCPServer({
+  name: "Calendar Agent",


this seems a bit silly to call an agent to be honest. I sorta prefer to be honest about this just being tools

mikeldking · 2025-06-19T23:31:09Z

examples/observe_agents/coordinator/coordinator.py

+tracer_provider = register(
+    auto_instrument=True,
+    endpoint="http://localhost:4317",
+    verbose=False,


turned off logging.

mikeldking · 2025-06-19T23:31:44Z

examples/observe_agents/coordinator/coordinator.py

+    instructions=(
+        "You are a scheduling assistant. Use the calendar tools to find availability "
+        "and create events. When an email needs to be sent, *handoff* to the Mail "
+        "Agent.  Respond with a confirmation once all steps are complete."
+    ),


this set of instructions unfortunately cannot use our prompts. How do we plan on supporting this?

examples/observe_agents/generate_google_token.py

…into observe-agents

review-notebook-app · 2025-06-27T15:47:38Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

sourcery-ai

Hey @mikeldking - I've reviewed your changes - here's some feedback:

Blocking issues:

Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)

General comments:

The PR description mentions a TypeScript Calendar Agent, but the diff only includes a Python calendar_agent.py—please add the TS implementation or update the docs to match.
There’s a typo in workflow.ipynb (!uv pip install -r requirements.txt); it should be something like !pip install -r requirements.txt.
You’re repeating Phoenix client and prompt‐loading logic in each agent—consider centralizing that into a shared utility to reduce duplication.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- The PR description mentions a TypeScript Calendar Agent, but the diff only includes a Python calendar_agent.py—please add the TS implementation or update the docs to match.
- There’s a typo in workflow.ipynb (`!uv pip install -r requirements.txt`); it should be something like `!pip install -r requirements.txt`.
- You’re repeating Phoenix client and prompt‐loading logic in each agent—consider centralizing that into a shared utility to reduce duplication.

## Individual Comments

### Comment 1
<location> `examples/observe_agents/front_end.py:79` </location>
<code_context>
+    return await Runner.run(agent, prompt)
+
+
+def run_agent_sync(agent, prompt):
+    """Run the agent synchronously by creating a new event loop."""
+    try:
</code_context>

<issue_to_address>
Potential for event loop conflicts in run_agent_sync.

Using nest_asyncio with Streamlit's event loop may cause deadlocks or UI issues. Consider running the agent in a separate thread or process to prevent conflicts.
</issue_to_address>

### Comment 2
<location> `examples/observe_agents/front_end.py:385` </location>
<code_context>
+# CLI entry point
+# ---------------------------------------------------------------------------
+
+if __name__ == "__main__":
+    print(
+        "📅 Calendar Agent (openai-agents) ready. Type your calendar-related request ('exit' to quit)."
</code_context>

<issue_to_address>
Launching Streamlit from within the script can cause recursion.

Launching Streamlit from within the script may cause recursive execution. Recommend removing this block and instructing users to use 'streamlit run' instead.
</issue_to_address>

### Comment 3
<location> `examples/observe_agents/utils/generate_google_token.py:58` </location>
<code_context>
+    creds = flow.run_local_server(port=0)
+
+    # Save the token
+    token_path = Path("../keys/gmail_token.json")
+
+    # Convert credentials to dictionary format
</code_context>

<issue_to_address>
Token path is hardcoded and may not exist.

Consider making the output path configurable or ensuring the directory exists and is writable before saving the token.
</issue_to_address>

### Comment 4
<location> `examples/observe_agents/workflow.ipynb:281` </location>
<code_context>
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "expamples_df = combined_df[\n",
+    "    [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+    "].head()\n",
+    "expamples_df = expamples_df[:3]"
+   ]
+  },
</code_context>

<issue_to_address>
Typo in variable name 'expamples_df'.

Please correct the variable name to 'examples_df' to avoid confusion or potential reference errors.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "expamples_df = combined_df[\n",
+    "    [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+    "].head()\n",
+    "expamples_df = expamples_df[:3]"
+   ]
+  },
=======
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "examples_df = combined_df[\n",
+    "    [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+    "].head()\n",
+    "examples_df = examples_df[:3]"
+   ]
+  },
>>>>>>> REPLACE

</suggested_fix>

## Security Issues

### Issue 1
<location> `examples/observe_agents/front_end.py:398` </location>

<issue_to_address>
**security (opengrep-rules.python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

### Issue 2
<location> `examples/observe_agents/front_end.py:401` </location>

<issue_to_address>
**security (opengrep-rules.python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-06-27T15:49:52Z

examples/observe_agents/front_end.py

+    return await Runner.run(agent, prompt)
+
+
+def run_agent_sync(agent, prompt):


issue (bug_risk): Potential for event loop conflicts in run_agent_sync.

Using nest_asyncio with Streamlit's event loop may cause deadlocks or UI issues. Consider running the agent in a separate thread or process to prevent conflicts.

sourcery-ai · 2025-06-27T15:49:52Z

examples/observe_agents/front_end.py

+        else:
+            st.info("⚠️ No span ID captured for feedback on this message")
+
+if __name__ == "__main__":


suggestion (bug_risk): Launching Streamlit from within the script can cause recursion.

Launching Streamlit from within the script may cause recursive execution. Recommend removing this block and instructing users to use 'streamlit run' instead.

sourcery-ai · 2025-06-27T15:49:52Z

examples/observe_agents/utils/generate_google_token.py

+    creds = flow.run_local_server(port=0)
+
+    # Save the token
+    token_path = Path("../keys/gmail_token.json")


issue (bug_risk): Token path is hardcoded and may not exist.

Consider making the output path configurable or ensuring the directory exists and is writable before saving the token.

sourcery-ai · 2025-06-27T15:49:52Z

examples/observe_agents/workflow.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "expamples_df = combined_df[\n",
+    "    [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+    "].head()\n",
+    "expamples_df = expamples_df[:3]"
+   ]
+  },


issue (typo): Typo in variable name 'expamples_df'.

Please correct the variable name to 'examples_df' to avoid confusion or potential reference errors.

Suggested change

"metadata": {},

"outputs": [],

"source": [

"expamples_df = combined_df[\n",

" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",

"].head()\n",

"expamples_df = expamples_df[:3]"

]

},

+ "metadata": {},

+ "outputs": [],

+ "source": [

+ "examples_df = combined_df[\n",

+ " [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",

+ "].head()\n",

+ "examples_df = examples_df[:3]"

+ ]

+ },

sourcery-ai · 2025-06-27T15:49:52Z

examples/observe_agents/front_end.py

+            pass
+        else:
+            # Not running under Streamlit, launch it
+            subprocess.run([sys.executable, "-m", "streamlit", "run", __file__])


security (opengrep-rules.python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

sourcery-ai · 2025-06-27T15:49:53Z

examples/observe_agents/front_end.py

+        if st.button(
+            "👍 Good", key=f"good_{message_index}", help="Response was helpful and accurate"
+        ):
+            success = log_feedback_to_phoenix(


issue (code-quality): Use named expression to simplify assignment and conditional [×4] (use-named-expression)

sourcery-ai · 2025-06-27T15:49:53Z

examples/observe_agents/front_end.py

+        st.chat_message("assistant").markdown(msg["content"])
+
+        # Add feedback UI for assistant messages
+        span_id = msg.get("span_id")


issue (code-quality): We've found these issues:

Use named expression to simplify assignment and conditional (use-named-expression)

Swap if/else to remove empty if body (remove-pass-body)

sourcery-ai · 2025-06-27T15:49:53Z

examples/observe_agents/mail_agent.py

+                    "body": body[:200] + "..."
+                    if len(body) > 200
+                    else body,  # Truncate long messages


suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation)

Suggested change

"body": body[:200] + "..."

if len(body) > 200

else body, # Truncate long messages

"body": f"{body[:200]}..." if len(body) > 200 else body,

sourcery-ai · 2025-06-27T15:49:54Z

examples/observe_agents/utils/evaluate_experiment.py

+    differences = []
+    differences.append("❌ Output does not match expected:")
+


suggestion (code-quality): Merge append into list declaration (merge-list-append)

Suggested change

differences = []

differences.append("❌ Output does not match expected:")

differences = ["❌ Output does not match expected:"]

sourcery-ai · 2025-06-27T15:49:54Z

examples/observe_agents/utils/evaluate_experiment.py

+        similarity = 1.0  # Both strings are empty, perfect match
+    else:
+        distance = levenshtein(output_str, expected_str)
+        similarity = 1.0 - (distance / max_len)
+    return similarity


suggestion (code-quality): We've found these issues:

Lift return into if (lift-return-into-if)

Remove unnecessary else after guard condition (remove-unnecessary-else)

Suggested change

similarity = 1.0 # Both strings are empty, perfect match

else:

distance = levenshtein(output_str, expected_str)

similarity = 1.0 - (distance / max_len)

return similarity

return 1.0

distance = levenshtein(output_str, expected_str)

return 1.0 - (distance / max_len)

examples/observe_agents/utils/generate_google_token.py

graphite-app · 2025-06-27T15:50:06Z

examples/observe_agents/workflow.ipynb

+    "expamples_df = combined_df[\n",
+    "    [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+    "].head()\n",
+    "expamples_df = expamples_df[:3]"


The variable name expamples_df appears to be misspelled. It should be examples_df for consistency and clarity throughout the codebase. This would make the code more maintainable and easier to understand for other developers.

Suggested change

"expamples_df = combined_df[\n",

" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",

"].head()\n",

"expamples_df = expamples_df[:3]"

"examples_df = combined_df[\n",

" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",

"].head()\n",

"examples_df = examples_df[:3]"

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

graphite-app · 2025-06-27T15:50:07Z

examples/observe_agents/utils/evaluate_experiment.py

+
+
+if __name__ == "__main__":
+    run_evaluation_for_experiment(["RXhwZXJpbWVudDo1NA=="])


The hardcoded experiment ID RXhwZXJpbWVudDo1NA== appears to be instance-specific and won't work across different installations. This will cause the script to fail when users run it in their own environments. Consider replacing this with a placeholder or adding instructions for users to insert their own experiment IDs generated during the workflow.

Suggested change

run_evaluation_for_experiment(["RXhwZXJpbWVudDo1NA=="])

# Replace the experiment ID below with your own experiment ID generated during the workflow

# Example: run_evaluation_for_experiment(["YOUR_EXPERIMENT_ID_HERE"])

run_evaluation_for_experiment([])

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

graphite-app · 2025-06-27T15:52:17Z

examples/observe_agents/utils/generate_google_token.py

+    creds = flow.run_local_server(port=0)
+
+    # Save the token
+    token_path = Path("../keys/gmail_token.json")


The code assumes the ../keys directory exists, but it might not be present when the script runs. Consider adding directory creation before writing the token file:

token_path.parent.mkdir(parents=True, exist_ok=True)

This ensures the parent directory exists before attempting to save the token, preventing potential file write errors.

Suggested change

token_path = Path("../keys/gmail_token.json")

token_path = Path("../keys/gmail_token.json")

token_path.parent.mkdir(parents=True, exist_ok=True)

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

…into observe-agents

Initial commit

d58882c

github-project-automation bot added this to phoenix Jun 13, 2025

github-project-automation bot moved this to 📘 Todo in phoenix Jun 13, 2025

gititnore and remove console output for telemetry

893b678

mikeldking commented Jun 19, 2025

View reviewed changes

Simplified working agent

a04be72

github-advanced-security bot found potential problems Jun 20, 2025

View reviewed changes

examples/observe_agents/generate_google_token.py Fixed Show fixed Hide fixed

examples/observe_agents/generate_google_token.py Fixed Show fixed Hide fixed

Jgilhuly linked an issue Jun 20, 2025 that may be closed by this pull request

Observe talk - self-improving oss repo #7500

Closed

Jgilhuly and others added 5 commits June 20, 2025 16:26

Adding prompts, prepping for others to run

c054d29

Addressing comments

7ce3567

Merge branch 'main' into observe-agents

f4deab6

Updating post observe demo

c0c72c8

Merge branch 'observe-agents' of https://github.com/Arize-ai/phoenix …

a72978b

…into observe-agents

Jgilhuly marked this pull request as ready for review June 27, 2025 15:48

Jgilhuly requested review from a team as code owners June 27, 2025 15:48

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 27, 2025

Merge branch 'main' into observe-agents

3b3e442

sourcery-ai bot requested changes Jun 27, 2025

View reviewed changes

github-project-automation bot moved this from 📘 Todo to 🔍. Needs Review in phoenix Jun 27, 2025

graphite-app bot reviewed Jun 27, 2025

View reviewed changes

Style fix

96abb38

graphite-app bot reviewed Jun 27, 2025

View reviewed changes

JohnGilhuly and others added 3 commits June 27, 2025 08:52

Update examples/observe_agents/utils/generate_google_token.py

d1abad6

Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>

Fix typo

71d0ace

Merge branch 'observe-agents' of https://github.com/Arize-ai/phoenix …

be28ce7

…into observe-agents

mikeldking closed this Jul 16, 2025

github-project-automation bot moved this from 🔍. Needs Review to ✅ Done in phoenix Jul 16, 2025

		return await Runner.run(agent, prompt)


		def run_agent_sync(agent, prompt):

		differences = []
		differences.append("❌ Output does not match expected:")

	differences = []
	differences.append("❌ Output does not match expected:")
	differences = ["❌ Output does not match expected:"]



		if __name__ == "__main__":
		run_evaluation_for_experiment(["RXhwZXJpbWVudDo1NA=="])

-    run_evaluation_for_experiment(["RXhwZXJpbWVudDo1NA=="])
+    # Replace the experiment ID below with your own experiment ID generated during the workflow
+    # Example: run_evaluation_for_experiment(["YOUR_EXPERIMENT_ID_HERE"])
+    run_evaluation_for_experiment([])

	token_path = Path("../keys/gmail_token.json")
	token_path = Path("../keys/gmail_token.json")
	token_path.parent.mkdir(parents=True, exist_ok=True)

feat: observe agents talk #8092

feat: observe agents talk #8092

Uh oh!

Conversation

mikeldking commented Jun 13, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

mikeldking Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikeldking Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

mikeldking Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

mikeldking Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

mikeldking Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

mikeldking Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

review-notebook-app bot commented Jun 27, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

graphite-app bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

graphite-app bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

graphite-app bot Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikeldking commented Jun 13, 2025 •

edited by sourcery-ai bot

Loading