-
Notifications
You must be signed in to change notification settings - Fork 2.5k
core[minor],google-common[minor]: Add support for generic objects in prompts, gemini audio/video docs #5043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -101,6 +136,11 @@ export function messageContentToParts(content: MessageContent): GeminiPart[] { | |||
return messageContentImageUrl(content as MessageContentImageUrl); | |||
} | |||
break; | |||
case "audio": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to change this to a generic "media" or "object" or "blob"? (And have the other methods named similarly)
This way we can support audio, video, and images at once.
}; | ||
} | ||
|
||
throw new Error("Invalid audio content"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Invalid media content"
data: content.data, | ||
}, | ||
}; | ||
} else if ("mimeType" in content && "fileUri" in content) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pondering out loud:
- perhaps instead of "fileUri" we just make it "uri" or "url"
- then we just use
extractMimeType
method as above to see if it's a data: url (and "mimeType" is optional or ignored) and useinlineData
- If it isn't a data url, and we have the mime type, then we pass the url and mime types given to us using
fileData
This would then turn the messageContentImageUrl
function into calling this function with the attributes of content
set.
(And it also means that if we add more sophisticated file handling later, we only have to change it in one place.)
Google's Gemini API offers support for audio and video input, along with function calling. | ||
Together, we can pair these API features to extract structured data given audio or video input. | ||
|
||
In the following examples, we'll demonstrate how to read and send MP3 and MP4 files to the Gemini API, and receive structured output as a response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love these examples! Perhaps comment that you don't need to use structured output with audio and video, but it always helps to understand what the results can be.
@@ -101,6 +133,8 @@ export function messageContentToParts(content: MessageContent): GeminiPart[] { | |||
return messageContentImageUrl(content as MessageContentImageUrl); | |||
} | |||
break; | |||
case "media": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I had seen https://github.com/langchain-ai/langchainjs/blame/fc2f9de2910a6728cf9c24f9146b55ba48d3790f/langchain-core/src/messages/index.ts#L56C69-L56C69 when it went in!
I'm honest, I'm a little anxious about defining MessageContent types with magic string values and not real types. Even if they're fundamentally Record types. Makes it a lot more difficult for other implementations to use consistent naming.
@@ -74,6 +83,29 @@ function messageContentImageUrl( | |||
} | |||
} | |||
|
|||
function messageContentToMedia( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: Shouldn't this be messageContentMedia
without the "to"? It is a "media" MessageContent type - we're not converting it to media.
Awesome getting this done so fast! I don't see any fundamental issues. A couple of nits. A couple of other small suggestions. My only major suggestion would be the changes that make the uri/url a little more flexible and have |
examples/lance_ls_eval_video.mp4
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove? Slows down git
examples/Mozart_Requiem_D_minor.mp3
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
No description provided.