-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New Feat: Added Embedding Callbacks #1859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
…n callback manager
|
||
// TODO Implement handleAgentEnd, handleText | ||
|
||
// onAgentEnd?(run: ChainRun): void | Promise<void>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these comments, this has been done
langchain/src/retrievers/base.ts
Outdated
@@ -0,0 +1,12 @@ | |||
import { Callbacks } from "../callbacks/manager.js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually would want this file to be moved to schema/retriever.ts
…eleted unused comments
Looks solid to me - I know we're waiting on some other factors but excited to have this merged! |
I see that there's a new commit that changes how callback manager works (e.g. adding metadata, use BaseCallbackConfig, etc). Should I merge those changes to this pull request or should I wait? |
@ppramesi Would it be possible to include not only the vector into the callback payload, but also the token usage? If in doubt, just the complete API response? |
@the-powerpointer hmmm I'll have to take a look at the code first. I'll get back to you as soon as I can. |
Any news on this? Tracking embeddings token usage is very important, and this PR seems like the viable way to it. |
Still delayed unfortunately - will reraise internally! |
I was just investigating the usage of embedding models, happy to found this PR. Hope you'll merge it soon, this is much needed 🙏🏼 |
Would be great if this can be merged. We are currently keep track of each embedding done by different models so if we can somehow get the usage from the vendor itself then it would be amazing. |
Waiting this feature so much. But why token usage is not included in the embedding result and just the vectors are returned? |
Added new callbacks:
handleEmbeddingStart: called before embedding is called. It passes serialized embedding class instance and array of texts to the callback function. It passes an array of texts because embedDocuments accepts multiple documents at once. As such, it also passes the single piece of text from embedQuery as an array with length 1.
handleEmbeddingEnd: called after embedding. It passes an array of vectors to the callback function.
handleEmbeddingError: called if there's an error during embed call.
Moreover, there are also some changes that needed to be made to implement these:
Changed Embeddings class such that now child classes implements _embedDocuments and _embedQuery functions since embedDocuments and embedQuery calls handle the callback calls.
Embedding classes are serializable, and added their secrets and aliases.
Moved BaseRetriever into its own file from schema to prevent circular dependency.
Added unit tests for the new callback.