-
Notifications
You must be signed in to change notification settings - Fork 490
feat: observe agents talk #8092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
examples/observe_agents/README.md
Outdated
2. **Set env vars** | ||
```bash | ||
export GOOGLE_SERVICE_ACCOUNT_JSON=/path/to/creds.json | ||
export GMAIL_TOKEN_JSON=/path/to/gmail_token.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I was able to figure out what these credentials were @Jgilhuly ...
examples/observe_agents/README.md
Outdated
|
||
3. **Run agents (each in its own terminal)** | ||
```bash | ||
pnpm --filter calendar_agent dev # Mastra calendar agent with OTEL tracing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's no pnpm workspace at the root I don't think this works.
examples/observe_agents/README.md
Outdated
export GMAIL_TOKEN_JSON=/path/to/gmail_token.json | ||
export OPENAI_API_KEY=sk-... | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need instructions on how to start phoenix
// ---------------------------------------------------------------------------- | ||
|
||
const server = new MCPServer({ | ||
name: "Calendar Agent", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a bit silly to call an agent to be honest. I sorta prefer to be honest about this just being tools
tracer_provider = register( | ||
auto_instrument=True, | ||
endpoint="http://localhost:4317", | ||
verbose=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turned off logging.
instructions=( | ||
"You are a scheduling assistant. Use the calendar tools to find availability " | ||
"and create events. When an email needs to be sent, *handoff* to the Mail " | ||
"Agent. Respond with a confirmation once all steps are complete." | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this set of instructions unfortunately cannot use our prompts. How do we plan on supporting this?
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mikeldking - I've reviewed your changes - here's some feedback:
Blocking issues:
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
- Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)
General comments:
- The PR description mentions a TypeScript Calendar Agent, but the diff only includes a Python calendar_agent.py—please add the TS implementation or update the docs to match.
- There’s a typo in workflow.ipynb (
!uv pip install -r requirements.txt
); it should be something like!pip install -r requirements.txt
. - You’re repeating Phoenix client and prompt‐loading logic in each agent—consider centralizing that into a shared utility to reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The PR description mentions a TypeScript Calendar Agent, but the diff only includes a Python calendar_agent.py—please add the TS implementation or update the docs to match.
- There’s a typo in workflow.ipynb (`!uv pip install -r requirements.txt`); it should be something like `!pip install -r requirements.txt`.
- You’re repeating Phoenix client and prompt‐loading logic in each agent—consider centralizing that into a shared utility to reduce duplication.
## Individual Comments
### Comment 1
<location> `examples/observe_agents/front_end.py:79` </location>
<code_context>
+ return await Runner.run(agent, prompt)
+
+
+def run_agent_sync(agent, prompt):
+ """Run the agent synchronously by creating a new event loop."""
+ try:
</code_context>
<issue_to_address>
Potential for event loop conflicts in run_agent_sync.
Using nest_asyncio with Streamlit's event loop may cause deadlocks or UI issues. Consider running the agent in a separate thread or process to prevent conflicts.
</issue_to_address>
### Comment 2
<location> `examples/observe_agents/front_end.py:385` </location>
<code_context>
+# CLI entry point
+# ---------------------------------------------------------------------------
+
+if __name__ == "__main__":
+ print(
+ "📅 Calendar Agent (openai-agents) ready. Type your calendar-related request ('exit' to quit)."
</code_context>
<issue_to_address>
Launching Streamlit from within the script can cause recursion.
Launching Streamlit from within the script may cause recursive execution. Recommend removing this block and instructing users to use 'streamlit run' instead.
</issue_to_address>
### Comment 3
<location> `examples/observe_agents/utils/generate_google_token.py:58` </location>
<code_context>
+ creds = flow.run_local_server(port=0)
+
+ # Save the token
+ token_path = Path("../keys/gmail_token.json")
+
+ # Convert credentials to dictionary format
</code_context>
<issue_to_address>
Token path is hardcoded and may not exist.
Consider making the output path configurable or ensuring the directory exists and is writable before saving the token.
</issue_to_address>
### Comment 4
<location> `examples/observe_agents/workflow.ipynb:281` </location>
<code_context>
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "expamples_df = combined_df[\n",
+ " [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+ "].head()\n",
+ "expamples_df = expamples_df[:3]"
+ ]
+ },
</code_context>
<issue_to_address>
Typo in variable name 'expamples_df'.
Please correct the variable name to 'examples_df' to avoid confusion or potential reference errors.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "expamples_df = combined_df[\n",
+ " [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+ "].head()\n",
+ "expamples_df = expamples_df[:3]"
+ ]
+ },
=======
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "examples_df = combined_df[\n",
+ " [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n",
+ "].head()\n",
+ "examples_df = examples_df[:3]"
+ ]
+ },
>>>>>>> REPLACE
</suggested_fix>
## Security Issues
### Issue 1
<location> `examples/observe_agents/front_end.py:398` </location>
<issue_to_address>
**security (opengrep-rules.python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
### Issue 2
<location> `examples/observe_agents/front_end.py:401` </location>
<issue_to_address>
**security (opengrep-rules.python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
*Source: opengrep*
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
return await Runner.run(agent, prompt) | ||
|
||
|
||
def run_agent_sync(agent, prompt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Potential for event loop conflicts in run_agent_sync.
Using nest_asyncio with Streamlit's event loop may cause deadlocks or UI issues. Consider running the agent in a separate thread or process to prevent conflicts.
else: | ||
st.info("⚠️ No span ID captured for feedback on this message") | ||
|
||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Launching Streamlit from within the script can cause recursion.
Launching Streamlit from within the script may cause recursive execution. Recommend removing this block and instructing users to use 'streamlit run' instead.
creds = flow.run_local_server(port=0) | ||
|
||
# Save the token | ||
token_path = Path("../keys/gmail_token.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Token path is hardcoded and may not exist.
Consider making the output path configurable or ensuring the directory exists and is writable before saving the token.
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"expamples_df = combined_df[\n", | ||
" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n", | ||
"].head()\n", | ||
"expamples_df = expamples_df[:3]" | ||
] | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (typo): Typo in variable name 'expamples_df'.
Please correct the variable name to 'examples_df' to avoid confusion or potential reference errors.
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"expamples_df = combined_df[\n", | |
" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n", | |
"].head()\n", | |
"expamples_df = expamples_df[:3]" | |
] | |
}, | |
+ "metadata": {}, | |
+ "outputs": [], | |
+ "source": [ | |
+ "examples_df = combined_df[\n", | |
+ " [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n", | |
+ "].head()\n", | |
+ "examples_df = examples_df[:3]" | |
+ ] | |
+ }, |
pass | ||
else: | ||
# Not running under Streamlit, launch it | ||
subprocess.run([sys.executable, "-m", "streamlit", "run", __file__]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
security (opengrep-rules.python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.
Source: opengrep
if st.button( | ||
"👍 Good", key=f"good_{message_index}", help="Response was helpful and accurate" | ||
): | ||
success = log_feedback_to_phoenix( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Use named expression to simplify assignment and conditional [×4] (use-named-expression
)
st.chat_message("assistant").markdown(msg["content"]) | ||
|
||
# Add feedback UI for assistant messages | ||
span_id = msg.get("span_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Use named expression to simplify assignment and conditional (
use-named-expression
) - Swap if/else to remove empty if body (
remove-pass-body
)
"body": body[:200] + "..." | ||
if len(body) > 200 | ||
else body, # Truncate long messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use f-string instead of string concatenation (use-fstring-for-concatenation
)
"body": body[:200] + "..." | |
if len(body) > 200 | |
else body, # Truncate long messages | |
"body": f"{body[:200]}..." if len(body) > 200 else body, |
differences = [] | ||
differences.append("❌ Output does not match expected:") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Merge append into list declaration (merge-list-append
)
differences = [] | |
differences.append("❌ Output does not match expected:") | |
differences = ["❌ Output does not match expected:"] |
similarity = 1.0 # Both strings are empty, perfect match | ||
else: | ||
distance = levenshtein(output_str, expected_str) | ||
similarity = 1.0 - (distance / max_len) | ||
return similarity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): We've found these issues:
- Lift return into if (
lift-return-into-if
) - Remove unnecessary else after guard condition (
remove-unnecessary-else
)
similarity = 1.0 # Both strings are empty, perfect match | |
else: | |
distance = levenshtein(output_str, expected_str) | |
similarity = 1.0 - (distance / max_len) | |
return similarity | |
return 1.0 | |
distance = levenshtein(output_str, expected_str) | |
return 1.0 - (distance / max_len) |
"expamples_df = combined_df[\n", | ||
" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n", | ||
"].head()\n", | ||
"expamples_df = expamples_df[:3]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name expamples_df
appears to be misspelled. It should be examples_df
for consistency and clarity throughout the codebase. This would make the code more maintainable and easier to understand for other developers.
"expamples_df = combined_df[\n", | |
" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n", | |
"].head()\n", | |
"expamples_df = expamples_df[:3]" | |
"examples_df = combined_df[\n", | |
" [\"annotation_name\", \"result.label\", \"attributes.input.value\", \"attributes.output.value\"]\n", | |
"].head()\n", | |
"examples_df = examples_df[:3]" |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
|
||
|
||
if __name__ == "__main__": | ||
run_evaluation_for_experiment(["RXhwZXJpbWVudDo1NA=="]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded experiment ID RXhwZXJpbWVudDo1NA==
appears to be instance-specific and won't work across different installations. This will cause the script to fail when users run it in their own environments. Consider replacing this with a placeholder or adding instructions for users to insert their own experiment IDs generated during the workflow.
run_evaluation_for_experiment(["RXhwZXJpbWVudDo1NA=="]) | |
# Replace the experiment ID below with your own experiment ID generated during the workflow | |
# Example: run_evaluation_for_experiment(["YOUR_EXPERIMENT_ID_HERE"]) | |
run_evaluation_for_experiment([]) |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
creds = flow.run_local_server(port=0) | ||
|
||
# Save the token | ||
token_path = Path("../keys/gmail_token.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code assumes the ../keys
directory exists, but it might not be present when the script runs. Consider adding directory creation before writing the token file:
token_path.parent.mkdir(parents=True, exist_ok=True)
This ensures the parent directory exists before attempting to save the token, preventing potential file write errors.
token_path = Path("../keys/gmail_token.json") | |
token_path = Path("../keys/gmail_token.json") | |
token_path.parent.mkdir(parents=True, exist_ok=True) |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
…into observe-agents
Summary by Sourcery
Introduce a comprehensive "Observe Agents" demo under examples/observe_agents showcasing a multi-agent personal assistant driven by OpenAI Agents and fully instrumented with Arize Phoenix for observability and evaluation.
New Features:
Documentation: