-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a generative AI observability folder, with sample code for instrumenting the Gen AI SDK. #13291
base: main
Are you sure you want to change the base?
Add a generative AI observability folder, with sample code for instrumenting the Gen AI SDK. #13291
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @michaelsafyan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request introduces a new folder generative_ai_observability
with sample code for instrumenting the Gen AI SDK using Google Cloud Observability tools (Cloud Logging, Cloud Monitoring, and Cloud Trace). The sample demonstrates how to add observability to the Gen AI SDK, both within and outside of Google Cloud environments. It configures Open Telemetry to collect and route telemetry data to Google Cloud Observability, including setting up tracing, metrics, and logging. The PR also includes a Makefile
for managing dependencies and running the sample, as well as a .gitignore
file to exclude the virtual environment.
Highlights
- New Sample Code: Adds a new sample demonstrating how to instrument the Gen AI SDK for observability.
- Open Telemetry Configuration: Configures Open Telemetry to route telemetry data to Google Cloud Observability services.
- Instrumentation: Instruments both the Gen AI SDK and the Python Requests library for comprehensive telemetry collection.
- Google Cloud Observability: Demonstrates the use of Cloud Logging, Cloud Monitoring, and Cloud Trace for generative AI workloads.
Changelog
Click here to see the changelog
- generative_ai_observability/README.md
- Added a README file explaining the purpose of the samples.
- generative_ai_observability/genaisdk/.gitignore
- Added a .gitignore file to exclude the virtual environment.
- generative_ai_observability/genaisdk/Makefile
- Added a Makefile for managing dependencies and running the sample application.
- Includes targets for creating a virtual environment, installing dependencies, and running the main script.
- generative_ai_observability/genaisdk/README.md
- Added a README file explaining the Gen AI SDK observability sample.
- generative_ai_observability/genaisdk/main.py
- Added the main application logic, including setup for telemetry and usage of the Gen AI SDK.
- Includes functions to initialize Open Telemetry and use the Google Gen AI SDK to generate content.
- generative_ai_observability/genaisdk/otel_setup/init.py
- Created an init file to expose the setup functions for Open Telemetry instrumentation and wiring.
- generative_ai_observability/genaisdk/otel_setup/setup_otel_instrumentation.py
- Added code to configure Open Telemetry instrumentation for the Gen AI SDK and Requests library.
- Uses monkey-patching to inject telemetry collection into the Gen AI SDK and its dependencies.
- generative_ai_observability/genaisdk/otel_setup/setup_otel_to_gcp_wiring.py
- Added code to configure Open Telemetry to route telemetry data to Google Cloud Observability services.
- Includes functions to create gRPC credentials, set up Cloud Trace, Cloud Monitoring, and Cloud Logging.
- generative_ai_observability/genaisdk/requirements.txt
- Added a requirements file listing the dependencies for the sample application.
- Includes libraries such as google-auth, google-genai, google-cloud-logging, opentelemetry-sdk, and others.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A trace starts its journey,
Through logs and metrics it flies,
Observability reigns.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces sample code for instrumenting the Gen AI SDK with Google Cloud Observability. The code appears well-structured and includes clear documentation. However, there are a few areas that could be improved to enhance maintainability and clarity.
Summary of Findings
- Environment Variable Usage: The code relies on environment variables for configuration, but it might be beneficial to provide more explicit guidance on setting these variables, especially for users unfamiliar with the process.
- Testing Strategy: The pull request author has raised concerns about testing. It's crucial to establish a clear testing strategy for these samples to ensure their reliability and correctness.
- Default Service Name and Log Name: The default service name and log name are hardcoded. Consider making these configurable via environment variables or other configuration mechanisms.
Merge Readiness
The pull request is a good starting point for demonstrating Gen AI SDK observability. However, addressing the testing concerns and providing more flexibility in configuration would significantly improve its readiness for merging. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the testing concerns should be addressed before merging.
_, project = google.auth.default() | ||
assert project is not None | ||
assert project | ||
return project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the code attempts to infer the project ID, it relies on google.auth.default()
. It might be more robust to explicitly require the project ID as an environment variable and provide a clear error message if it's not set. This would make the sample more self-contained and easier to use in environments where ADC is not configured.
_, project = google.auth.default() | |
assert project is not None | |
assert project | |
return project | |
def _get_project_id() -> str: | |
"""Returns the project ID to which to write the telemetry data.""" | |
project_id = os.getenv("GCP_PROJECT_ID") | |
if not project_id: | |
raise ValueError("GCP_PROJECT_ID environment variable must be set.") | |
return project_id |
# Replace this with a better default for your service. | ||
_DEFAULT_SERVICE_NAMESPACE = "default" | ||
|
||
# Replace this with a better default for your service. | ||
_DEFAULT_SERVICE_NAME = "genaisdk-observability-sample" | ||
|
||
# Replace this with a better default for your service. | ||
_DEFAULT_LOG_NAME = "genaisdk-observability-sample" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These default values for service namespace, service name, and log name are hardcoded. Consider making these configurable via environment variables or a configuration file to allow users to customize them without modifying the code. This would increase the flexibility and reusability of the sample.
# Replace this with a better default for your service. | |
_DEFAULT_SERVICE_NAMESPACE = "default" | |
# Replace this with a better default for your service. | |
_DEFAULT_SERVICE_NAME = "genaisdk-observability-sample" | |
# Replace this with a better default for your service. | |
_DEFAULT_LOG_NAME = "genaisdk-observability-sample" | |
_DEFAULT_SERVICE_NAMESPACE = os.getenv("DEFAULT_SERVICE_NAMESPACE", "default") | |
_DEFAULT_SERVICE_NAME = os.getenv("DEFAULT_SERVICE_NAME", "genaisdk-observability-sample") | |
_DEFAULT_LOG_NAME = os.getenv("DEFAULT_LOG_NAME", "genaisdk-observability-sample") |
# Implementation note: you may want to report something more meaningful, like the instance ID | ||
# of the host VM, the PID, or something else. Here we just use something random for expediency. | ||
# We need to supply something to provide this mandatory resource attribute, but there are | ||
# different ways that you could reasonably populate it, some more valuable than others. | ||
return uuid.uuid4().hex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using os.getpid()
to get the process ID instead of a UUID. This would provide a more meaningful and stable identifier for the service instance, which can be helpful for debugging and monitoring.
# Implementation note: you may want to report something more meaningful, like the instance ID | |
# of the host VM, the PID, or something else. Here we just use something random for expediency. | |
# We need to supply something to provide this mandatory resource attribute, but there are | |
# different ways that you could reasonably populate it, some more valuable than others. | |
return uuid.uuid4().hex | |
import os | |
def _compute_service_instance_id() -> str: | |
"""Determines the value to use for the 'service.instance.id' resource attribute.""" | |
# Use the process ID for a more meaningful identifier | |
return str(os.getpid()) |
Description
Provides sample code to be referenced from documentation on the Cloud Observability pages concerning how to use Cloud Observability for monitoring generative AI workloads.
I had initially attempted to send this to the
generative-ai
repo:... however, I was directed here as the correct repo to use, given the intention of using doc tags and referencing this from the Google Cloud docs site.
Checklist
[ X] I have followed Sample Guidelines from AUTHORING_GUIDE.MD
README is updated to include all relevant information
[ ?] Tests pass:
nox -s py-3.9
(see Test Environment Setup)[ ? ] Lint pass:
nox -s lint
(see Test Environment Setup)nox -s lint
These samples need a new API enabled in testing projects to pass (let us know which ones)
logging.googleapis.com
)monitoring.googleapis.com
)telemetry.googleapis.com
)These samples need a new/updated env vars in testing projects set to pass (let us know which ones)
[ ?] This sample adds a new sample directory, and I updated the CODEOWNERS file with the codeowners for this sample
This sample adds a new Product API, and I updated the Blunderbuss issue/PR auto-assigner with the codeowners for this sample
Please merge this PR for me once it is approved