Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a generative AI observability folder, with sample code for instrumenting the Gen AI SDK. #13291

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

michaelsafyan
Copy link

Description

Provides sample code to be referenced from documentation on the Cloud Observability pages concerning how to use Cloud Observability for monitoring generative AI workloads.

I had initially attempted to send this to the generative-ai repo:

... however, I was directed here as the correct repo to use, given the intention of using doc tags and referencing this from the Google Cloud docs site.

Checklist

  • [ X] I have followed Sample Guidelines from AUTHORING_GUIDE.MD

  • README is updated to include all relevant information

  • [ ?] Tests pass: nox -s py-3.9 (see Test Environment Setup)

    • I'm uncertain about the best way to test this. Can you provide some guidance?
  • [ ? ] Lint pass: nox -s lint (see Test Environment Setup)

    • I get an error that there is no "noxfile" when attempting to use nox -s lint
  • These samples need a new API enabled in testing projects to pass (let us know which ones)

    • Maybe. The sample code makes use of the following APIs:
      • Cloud Logging (logging.googleapis.com)
      • Cloud Monitoring (monitoring.googleapis.com)
      • Telemetry API (telemetry.googleapis.com)
  • These samples need a new/updated env vars in testing projects set to pass (let us know which ones)

    • Uncertain. Let's discuss testing expectations/practices here.
  • [ ?] This sample adds a new sample directory, and I updated the CODEOWNERS file with the codeowners for this sample

    • This adds a new folder, but I'm unsure what I should update this to (I'm not sure if there is a good existing GitHub team that has the appropriate members).
  • This sample adds a new Product API, and I updated the Blunderbuss issue/PR auto-assigner with the codeowners for this sample

    • No
  • Please merge this PR for me once it is approved

    • Yes please

@michaelsafyan michaelsafyan requested review from a team as code owners April 4, 2025 20:21
Copy link

snippet-bot bot commented Apr 4, 2025

Here is the summary of possible violations 😱

There are 8 possible violations for not having product prefix.

The end of the violation section. All the stuff below is FYI purposes only.


Here is the summary of changes.

You are about to add 8 region tags.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Apr 4, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @michaelsafyan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces a new folder generative_ai_observability with sample code for instrumenting the Gen AI SDK using Google Cloud Observability tools (Cloud Logging, Cloud Monitoring, and Cloud Trace). The sample demonstrates how to add observability to the Gen AI SDK, both within and outside of Google Cloud environments. It configures Open Telemetry to collect and route telemetry data to Google Cloud Observability, including setting up tracing, metrics, and logging. The PR also includes a Makefile for managing dependencies and running the sample, as well as a .gitignore file to exclude the virtual environment.

Highlights

  • New Sample Code: Adds a new sample demonstrating how to instrument the Gen AI SDK for observability.
  • Open Telemetry Configuration: Configures Open Telemetry to route telemetry data to Google Cloud Observability services.
  • Instrumentation: Instruments both the Gen AI SDK and the Python Requests library for comprehensive telemetry collection.
  • Google Cloud Observability: Demonstrates the use of Cloud Logging, Cloud Monitoring, and Cloud Trace for generative AI workloads.

Changelog

Click here to see the changelog
  • generative_ai_observability/README.md
    • Added a README file explaining the purpose of the samples.
  • generative_ai_observability/genaisdk/.gitignore
    • Added a .gitignore file to exclude the virtual environment.
  • generative_ai_observability/genaisdk/Makefile
    • Added a Makefile for managing dependencies and running the sample application.
    • Includes targets for creating a virtual environment, installing dependencies, and running the main script.
  • generative_ai_observability/genaisdk/README.md
    • Added a README file explaining the Gen AI SDK observability sample.
  • generative_ai_observability/genaisdk/main.py
    • Added the main application logic, including setup for telemetry and usage of the Gen AI SDK.
    • Includes functions to initialize Open Telemetry and use the Google Gen AI SDK to generate content.
  • generative_ai_observability/genaisdk/otel_setup/init.py
    • Created an init file to expose the setup functions for Open Telemetry instrumentation and wiring.
  • generative_ai_observability/genaisdk/otel_setup/setup_otel_instrumentation.py
    • Added code to configure Open Telemetry instrumentation for the Gen AI SDK and Requests library.
    • Uses monkey-patching to inject telemetry collection into the Gen AI SDK and its dependencies.
  • generative_ai_observability/genaisdk/otel_setup/setup_otel_to_gcp_wiring.py
    • Added code to configure Open Telemetry to route telemetry data to Google Cloud Observability services.
    • Includes functions to create gRPC credentials, set up Cloud Trace, Cloud Monitoring, and Cloud Logging.
  • generative_ai_observability/genaisdk/requirements.txt
    • Added a requirements file listing the dependencies for the sample application.
    • Includes libraries such as google-auth, google-genai, google-cloud-logging, opentelemetry-sdk, and others.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A trace starts its journey,
Through logs and metrics it flies,
Observability reigns.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces sample code for instrumenting the Gen AI SDK with Google Cloud Observability. The code appears well-structured and includes clear documentation. However, there are a few areas that could be improved to enhance maintainability and clarity.

Summary of Findings

  • Environment Variable Usage: The code relies on environment variables for configuration, but it might be beneficial to provide more explicit guidance on setting these variables, especially for users unfamiliar with the process.
  • Testing Strategy: The pull request author has raised concerns about testing. It's crucial to establish a clear testing strategy for these samples to ensure their reliability and correctness.
  • Default Service Name and Log Name: The default service name and log name are hardcoded. Consider making these configurable via environment variables or other configuration mechanisms.

Merge Readiness

The pull request is a good starting point for demonstrating Gen AI SDK observability. However, addressing the testing concerns and providing more flexibility in configuration would significantly improve its readiness for merging. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the testing concerns should be addressed before merging.

Comment on lines +129 to +132
_, project = google.auth.default()
assert project is not None
assert project
return project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While the code attempts to infer the project ID, it relies on google.auth.default(). It might be more robust to explicitly require the project ID as an environment variable and provide a clear error message if it's not set. This would make the sample more self-contained and easier to use in environments where ADC is not configured.

Suggested change
_, project = google.auth.default()
assert project is not None
assert project
return project
def _get_project_id() -> str:
"""Returns the project ID to which to write the telemetry data."""
project_id = os.getenv("GCP_PROJECT_ID")
if not project_id:
raise ValueError("GCP_PROJECT_ID environment variable must be set.")
return project_id

Comment on lines +62 to +69
# Replace this with a better default for your service.
_DEFAULT_SERVICE_NAMESPACE = "default"

# Replace this with a better default for your service.
_DEFAULT_SERVICE_NAME = "genaisdk-observability-sample"

# Replace this with a better default for your service.
_DEFAULT_LOG_NAME = "genaisdk-observability-sample"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These default values for service namespace, service name, and log name are hardcoded. Consider making these configurable via environment variables or a configuration file to allow users to customize them without modifying the code. This would increase the flexibility and reusability of the sample.

Suggested change
# Replace this with a better default for your service.
_DEFAULT_SERVICE_NAMESPACE = "default"
# Replace this with a better default for your service.
_DEFAULT_SERVICE_NAME = "genaisdk-observability-sample"
# Replace this with a better default for your service.
_DEFAULT_LOG_NAME = "genaisdk-observability-sample"
_DEFAULT_SERVICE_NAMESPACE = os.getenv("DEFAULT_SERVICE_NAMESPACE", "default")
_DEFAULT_SERVICE_NAME = os.getenv("DEFAULT_SERVICE_NAME", "genaisdk-observability-sample")
_DEFAULT_LOG_NAME = os.getenv("DEFAULT_LOG_NAME", "genaisdk-observability-sample")

Comment on lines +87 to +91
# Implementation note: you may want to report something more meaningful, like the instance ID
# of the host VM, the PID, or something else. Here we just use something random for expediency.
# We need to supply something to provide this mandatory resource attribute, but there are
# different ways that you could reasonably populate it, some more valuable than others.
return uuid.uuid4().hex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using os.getpid() to get the process ID instead of a UUID. This would provide a more meaningful and stable identifier for the service instance, which can be helpful for debugging and monitoring.

Suggested change
# Implementation note: you may want to report something more meaningful, like the instance ID
# of the host VM, the PID, or something else. Here we just use something random for expediency.
# We need to supply something to provide this mandatory resource attribute, but there are
# different ways that you could reasonably populate it, some more valuable than others.
return uuid.uuid4().hex
import os
def _compute_service_instance_id() -> str:
"""Determines the value to use for the 'service.instance.id' resource attribute."""
# Use the process ID for a more meaningful identifier
return str(os.getpid())

@michaelsafyan michaelsafyan requested a review from a team as a code owner April 8, 2025 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants