Skip to content

Added support for google specific arguments for video analysis #2110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Sumered
Copy link

@Sumered Sumered commented Jul 1, 2025

Added support for Google models specific arguments when processing video, those arguments are:

  • media_resolution, which is model setting, settings this to LOW instead of default HIGH results in ~3x lower amount of input tokens consumed for video input.
  • fps, which is video specific settings, set by setting vendor_metadata in FileUrl/BinaryContent, controls frame sampling. Default is 1.0 for Google models, setting this to lower value decrease amount of input video tokens, setting it to higher value increase analysis quality in highly dynamic videos.
  • start_offset, which is video specific settings, set by setting vendor_metadata in FileUrl/BinaryContent, controls start offset of video. Useful for capping token consumption per video. According to docs it needs to contain s at the end, ex. 300s
  • end_offset, which is video specific settings, set by setting vendor_metadata in FileUrl/BinaryContent, controls end offset of video. Useful for capping token consumption per video.

Official Google docs for those new arguments:
https://ai.google.dev/gemini-api/docs/video-understanding

@Sumered
Copy link
Author

Sumered commented Jul 2, 2025

I'm still not exactly sure why are those tests failing, so I would be thankful if you could provide me with an explanation on what to do with those?

start_offset=item.vendor_metadata.get('start_offset', None),
end_offset=item.vendor_metadata.get('end_offset', None),
)
inline_data_dict['video_metadata'] = video_metadata # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work?

Suggested change
inline_data_dict['video_metadata'] = video_metadata # type: ignore
inline_data_dict['video_metadata'] = item.vendor_metadata # type: ignore

Comment on lines +102 to +111
vendor_metadata: dict[str, Any] | None = None
"""The vendor specific metadata for the file.
Currently supports only those keys:

fps: float,
start_offset: str (ex. 1800s),
end_offset: str (ex. 1800s)

And works only for google models for video analysis.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer something like this:

Suggested change
vendor_metadata: dict[str, Any] | None = None
"""The vendor specific metadata for the file.
Currently supports only those keys:
fps: float,
start_offset: str (ex. 1800s),
end_offset: str (ex. 1800s)
And works only for google models for video analysis.
"""
vendor_metadata: dict[str, Any] | None = None
"""Vendor-specific metadata for the file.
Supported by:
- `GoogleModel`: `VideoUrl.vendor_metadata` is used as `video_metadata`: https://ai.google.dev/gemini-api/docs/video-understanding#customize-video-processing
"""

@@ -1191,20 +1191,21 @@ wheels = [

[[package]]
name = "google-genai"
version = "1.15.0"
version = "1.23.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failure suggests that with the genai update, the previously recorded cassettes don't match anymore because post != POST... Can you try replacing method: POST with method: post in all the tests/models/cassettes/test_google/*.yaml files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants