.Net: [MEVD] Added detailed logging to vector stores #10968

dmytrostruk · 2025-03-13T22:29:11Z

Motivation and Context

This PR adds detailed logging for vector store operations with information including a collection name, records to upsert, record keys to get and more.

Example of flow with record upsert and vector search with LogLevel.Debug:

CollectionExistsAsync invoked. Collection name: skglossary.
CollectionExistsAsync completed. Collection name: skglossary. Collection exists: False
CreateCollectionAsync invoked. Collection name: skglossary.
CreateCollectionAsync completed. Collection name: skglossary.
UpsertAsync invoked. Collection name: skglossary.
UpsertAsync completed. Collection name: skglossary.
UpsertAsync invoked. Collection name: skglossary.
UpsertAsync completed. Collection name: skglossary.
UpsertAsync invoked. Collection name: skglossary.
UpsertAsync completed. Collection name: skglossary.
VectorizedSearchAsync invoked. Collection Name: skglossary.
VectorizedSearchAsync completed. Collection Name: skglossary.

Example of flow with Record upsert and vector search with LogLevel.Trace:

CollectionExistsAsync invoked. Collection name: skglossary.
CollectionExistsAsync completed. Collection name: skglossary. Collection exists: False
CreateCollectionAsync invoked. Collection name: skglossary.
CreateCollectionAsync completed. Collection name: skglossary.
UpsertAsync invoked. Collection name: skglossary.
UpsertAsync completed. Collection name: skglossary. Key: 1.
UpsertAsync invoked. Collection name: skglossary.
UpsertAsync completed. Collection name: skglossary. Key: 2.
UpsertAsync invoked. Collection name: skglossary.
UpsertAsync completed. Collection name: skglossary. Key: 3.
VectorizedSearchAsync invoked. Collection Name: skglossary.
VectorizedSearchAsync completed. Collection Name: skglossary.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

roji

Hey @dmytrostruk, here's a preliminary review; I didn't go too far so that we can first align on the general principles here etc.

In general, there seem to be two different logging approaches implemented here:

One on LogLevel.Trace only, using the source generator ([LoggerMessage]). Here the messages are customized (e.g. CreateCollection logs the name of the collection created).
One via a general VectorStoreLoggingExtensions utility, emitting a log message before, and another after (success/failure/cancellation). Here the messages are only "OperationName invoked/completed", no customization.

One problem here is that the default, non-trace logging gives no information - one expects to see which collection was created with the event (CreateCollection completed is, well, a bit incomplete). It seems that one has to activate LogLevel.Trace to get that, and at that point they get duplicate messages. In other words, I'd expect just a single system here that's informative (contains any information relevant to the operation).

On a related note, it's best practice to include Event IDs when logging, so that users can filter for specific events etc. Doing this - and doing it efficiently - probably means that you'd need to have the try/catch/finally that's currently in VectorStoreLoggingExtensions simply in the methods themselves, and call distinct, source-generated methods each time (CreateCollectionString(), CreateCollectionDone(), CreateCollectionFailed(), CreateCollectionCancelled(), each with its own message, event ID, etc.).

Also, see comments below about dumping user values as JSON into the log message - I think this is problematic.

Finally, I can't shake the feeling that the explosion of types (all the wrappers in VectorSearch and VectorStorage) indicates we're going in a problematic direction, for this particular abstraction. I can't really say anything more concrete than that, and I'm OK with merging something like this, but it feels a bit off.

dotnet/src/Connectors/VectorData/VectorStorage/LoggingVectorStoreRecordCollection.cs

roji · 2025-03-31T12:33:45Z

dotnet/src/Connectors/VectorData/VectorStorage/LoggingVectorStoreRecordCollection.cs

+    [LoggerMessage(LogLevel.Trace, "Getting records from '{CollectionName}' with keys: {Keys}")]
+    private partial void LogGetBatch(string collectionName, string keys);
+
+    [LoggerMessage(LogLevel.Trace, "Retrieved record from '{CollectionName}': {Record}")]


I'd measure the time (in ms) and log it with each operation, it's probably the first thing we'll get asked for.

This is already available in OpenTelemetry decorators in this PR: #11207.

By the way, the operation duration there is measured in seconds, not milliseconds, based on the convention from OpenTelemetry:
https://opentelemetry.io/docs/specs/semconv/database/database-metrics/#metric-dbclientoperationduration

Based on that, do we still need time measurement in Logging decorators as well?

Based on that, do we still need time measurement in Logging decorators as well?

I think so, yeah - logging and tracing indeed have some functional overlap, but are still two different features: some people only enable logging, and it seems reasonable for those people to expect us to log running times.

Yeah, OTel use seconds for duration as their standard, which I personally find a big odd; IIRC that can still be a fraction (e.g. 1.255s). I'm not sure what the appropriate unit of measure here would be (how much does embedding generation typically take?), but I'd err towards milliseconds rather than seconds...

roji · 2025-03-31T12:35:08Z

dotnet/src/Connectors/VectorData/VectorStorage/LoggingVectorStoreRecordCollection.cs

+
+    #region private
+
+    [LoggerMessage(LogLevel.Trace, "Collection '{CollectionName}' exists: {CollectionExists}")]


Consider adding EventIds here for each event, to allow consumers to e.g. filter on them.

Added Event ID for each log message and put all Event IDs in a separate internal constant class, see LoggingEventIds. Although I'm not sure if ID pattern is good, please let me know what you think about it:

I think that looks pretty good! As a total minor nit, maybe Starting is a bit more accurate/clear than Invoked, since nothing has actually been invoked yet when that log message is emitted.

I do see that MEAI have the same pattern - @SteveSandersonMS any thoughts about this?

dotnet/src/Connectors/VectorData/Extensions/VectorStoreLoggingExtensions.cs

roji · 2025-03-31T12:43:06Z

dotnet/src/Connectors/VectorData/Extensions/VectorStoreLoggingExtensions.cs

@@ -134,4 +213,6 @@ internal static async IAsyncEnumerable<TResult> RunWithLoggingAsync<TResult>(

    [LoggerMessage(LogLevel.Error, "{OperationName} failed.")]
    private static partial void LogInvocationFailed(this ILogger logger, string operationName, Exception exception);
+
+    #endregion


As the default logging experience, I'd expect to only see Collection 'Foo' created in 3ms. In other words, I think the "invoked" message (which should probably be named "invoking" as it's logged before the operation) should be in a lower level than the Completed message (so you only get the Completed message).

I tried to be consistent with MEAI logging naming, but I'm happy to adjust it based on our preferences:
https://github.com/dotnet/extensions/blob/56df0c4f4933909367536413dcf9c16520a37f82/src/Libraries/Microsoft.Extensions.AI/ChatCompletion/LoggingChatClient.cs#L173-L195

OK, thanks - I agree it makes sense to be consistent with MEAI here. @SteveSandersonMS what do you think, should the default experience include both before (invoked) and after (completed) messages, or just completed (and have invoked at trace)?

dmytrostruk · 2025-03-31T21:14:14Z

@roji Thanks for PR review!

One problem here is that the default, non-trace logging gives no information - one expects to see which collection was created with the event (CreateCollection completed is, well, a bit incomplete). It seems that one has to activate LogLevel.Trace to get that, and at that point they get duplicate messages. In other words, I'd expect just a single system here that's informative (contains any information relevant to the operation).

I removed message duplication and now there is only one message based on LogLevel value.

On a related note, it's best practice to include Event IDs when logging, so that users can filter for specific events etc. Doing this - and doing it efficiently - probably means that you'd need to have the try/catch/finally that's currently in VectorStoreLoggingExtensions simply in the methods themselves, and call distinct, source-generated methods each time (CreateCollectionString(), CreateCollectionDone(), CreateCollectionFailed(), CreateCollectionCancelled(), each with its own message, event ID, etc.).

I updated the code to have a dedicated log message for each operation and its possible outcome (i.e. completed/canceled/failed) and added a unique Event ID for each message.

Also, see comments below about dumping user values as JSON into the log message - I think this is problematic.

I removed JSON serialization for now.

roji

Finally, I can't shake the feeling that the explosion of types (all the wrappers in VectorSearch and VectorStorage) indicates we're going in a problematic direction, for this particular abstraction. I can't really say anything more concrete than that, and I'm OK with merging something like this, but it feels a bit off.

So on this... I had a chat with @stephentoub and thought about this some more.. Here are some ideas.

In general, I think we should think of the search interfaces (e.g. IVectorizedSearch, IKeywordHybridSearch) as views over a collection; somewhere, at some point in the user's application, there's going to be an actual concrete IVectorStoreRecordCollection, and the views just point to that. If we look at it that way, then it makes sense to apply all decorators on the collection, and then take interfaces which point to that. This would mean the following:

We shouldn't need any wrappers for IVectorizedSearch, since IVectorStoreRecordCollection extends it.
- It could be said that in theory, someone might have e.g. an implementation of IVectorizedSearch that doesn't point to an IVectorStoreRecordCollection, and that because of that a builder+wrapper specifically for IVectorizedSearch may be needed. I think that probably quite contrived, and would do this only if we have confirmed user cases which make this necessary; we can always add this in the future if necessary (no breaking change).
IVectorizedTextSearch will likely go the same way, since we're planning to have all collections support embedding generation as part of #10492. So IVectorizedSearch (or whatever we call it, maybe just IVectorSearch) would likely just have both SearchEmbeddingAsync (corresponding to today's VectorizedSearch) and SearchAsync (corresponding to today's VectorizableSearch).
That leaves hybrid search, which really is an optional interface (IVectorStoreRecordCollection doesn't extend it).
- We already have a problem of how a user actually gets LoggingKeywordHybridSearch registered in DI. Doing e.g. .AddWeaviateVectorStoreRecordCollection() registered IKeywordHybridSearch in DI (when the connector supports it), but does .AddWeaviateVectorStoreRecordCollection().UseLogging() register a LoggingKeywordHybridSearch for the user? I might have missed it, but I don't think so (only the collection decorator seems to get registered).
- We could have UseLogging() internally check if the IVectorStoreRecordCollection implements IKeywordHybridSearch, and if so, wrap that and register the wrapper in DI. This way, the user automatically gets everything registered without having to do extra manual steps.
- If that works, then we don't need a builder pipeline for IKeywordHybridSearch, nor do we need to expose the decorator publicly (in general, I'm not sure we need to expose any of them publicly).

If the above makes sense, if means we can pretty much get rid of the entire builders/wrappers over the search interfaces, and this becomes much more palatable, requiring the user to do much less and exposing less public APIs.

What do you think?

roji · 2025-04-01T10:52:11Z

dotnet/src/Connectors/VectorData/VectorSearch/LoggingVectorizableTextSearch.cs

+    #region private
+
+    [LoggerMessage(LoggingEventIds.VectorizableTextSearchInvoked, LogLevel.Debug, "{OperationName} invoked.")]
+    private partial void VectorizableTextSearchInvoked(string operationName);


Consider moving all these to a single, dedicated internal static class; this way the same method can be used from multiple wrappers (currently e.g. VectorizableTextSearchInvoked is duplicated here and on the wrapper for the collection).

roji · 2025-04-01T10:53:51Z

dotnet/src/Connectors/VectorData/VectorSearch/LoggingVectorizedSearch.cs


 namespace Microsoft.Extensions.VectorData;

 /// <summary>
 /// A vectorized search that logs operations to an <see cref="ILogger"/>
 /// </summary>
 [Experimental("SKEXP0020")]
-public class LoggingVectorizedSearch<TRecord> : IVectorizedSearch<TRecord>
+public partial class LoggingVectorizedSearch<TRecord> : IVectorizedSearch<TRecord>


Is there a reason that the wrappers actually have to be public? Should the user just be exposed to the interfaces instead? The less public API we expose the better.

roji · 2025-04-01T10:57:51Z

dotnet/src/Connectors/VectorData/VectorStorage/LoggingVectorStore.cs

-            this._innerStore.GetCollection<TKey, TRecord>(name, vectorStoreRecordDefinition),
-            this._logger);
+    {
+        return


nit: can be expression-bodied

dmytrostruk · 2025-04-01T17:13:47Z

This will be added later.

Added LogLevel.Trace logs to LoggingVectorStoreRecordCollection

0aac3d2

dmytrostruk self-assigned this Mar 13, 2025

markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel labels Mar 13, 2025

dmytrostruk temporarily deployed to integration March 13, 2025 22:29 — with GitHub Actions Inactive

More updates to the detailed logging

7a34a80

dmytrostruk had a problem deploying to integration March 17, 2025 21:51 — with GitHub Actions Error

dmytrostruk marked this pull request as ready for review March 17, 2025 21:51

dmytrostruk requested a review from a team as a code owner March 17, 2025 21:51

dmytrostruk changed the title ~~.Net: [Feature Branch][Draft] Added LogLevel.Trace logs to LoggingVectorStoreRecordCollection~~ .Net: [Feature Branch] Added LogLevel.Trace logs to LoggingVectorStoreRecordCollection Mar 17, 2025

dmytrostruk changed the title ~~.Net: [Feature Branch] Added LogLevel.Trace logs to LoggingVectorStoreRecordCollection~~ .Net: [Feature Branch] Added detailed logging to vector stores Mar 17, 2025

Fix warnings

fbb937b

dmytrostruk had a problem deploying to integration March 17, 2025 21:59 — with GitHub Actions Failure

Removed test

876def1

dmytrostruk had a problem deploying to integration March 17, 2025 22:32 — with GitHub Actions Failure

Fixed warnings

ee6c066

dmytrostruk had a problem deploying to integration March 17, 2025 22:36 — with GitHub Actions Error

Fixed warnings

002e46a

dmytrostruk temporarily deployed to integration March 17, 2025 22:40 — with GitHub Actions Inactive

Base automatically changed from feature-vector-data-telemetry to feature-vector-data-preb2 March 20, 2025 17:16

Merge branch 'feature-vector-data-preb2' into vector-data-trace-logging

fe4ece3

markwallace-microsoft added kernel.core documentation labels Mar 20, 2025

Merge branch 'feature-vector-data-preb2' into vector-data-trace-logging

8dd8410

markwallace-microsoft removed kernel.core documentation labels Mar 20, 2025

dmytrostruk temporarily deployed to integration March 20, 2025 17:27 — with GitHub Actions Inactive

markwallace-microsoft approved these changes Mar 25, 2025

View reviewed changes

dmytrostruk changed the title ~~.Net: [Feature Branch] Added detailed logging to vector stores~~ .Net: [MEVD] Added detailed logging to vector stores Mar 26, 2025

roji requested changes Mar 31, 2025

View reviewed changes

dmytrostruk added 2 commits March 31, 2025 13:13

Updated logging logic and tests based on PR feedback

d7e2ed5

Added event ID for each logging message

1b873cb

dmytrostruk had a problem deploying to integration March 31, 2025 20:36 — with GitHub Actions Failure

Updated logging examples

e073400

dmytrostruk temporarily deployed to integration March 31, 2025 20:59 — with GitHub Actions Inactive

Small fix

9926e5c

dmytrostruk temporarily deployed to integration March 31, 2025 21:12 — with GitHub Actions Inactive

roji reviewed Apr 1, 2025

View reviewed changes

dmytrostruk closed this Apr 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: [MEVD] Added detailed logging to vector stores #10968

.Net: [MEVD] Added detailed logging to vector stores #10968

dmytrostruk commented Mar 13, 2025 •

edited

Loading

roji left a comment

roji Mar 31, 2025

dmytrostruk Mar 31, 2025

roji Apr 1, 2025

roji Mar 31, 2025

dmytrostruk Mar 31, 2025

roji Apr 1, 2025 •

edited

Loading

roji Mar 31, 2025

dmytrostruk Mar 31, 2025

roji Apr 1, 2025

dmytrostruk commented Mar 31, 2025

roji left a comment

roji Apr 1, 2025

roji Apr 1, 2025

roji Apr 1, 2025

dmytrostruk commented Apr 1, 2025


		#region private

		[LoggerMessage(LogLevel.Trace, "Collection '{CollectionName}' exists: {CollectionExists}")]

.Net: [MEVD] Added detailed logging to vector stores #10968

.Net: [MEVD] Added detailed logging to vector stores #10968

Conversation

dmytrostruk commented Mar 13, 2025 • edited Loading

Motivation and Context

Contribution Checklist

roji left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roji Apr 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmytrostruk commented Mar 31, 2025

roji left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmytrostruk commented Apr 1, 2025

dmytrostruk commented Mar 13, 2025 •

edited

Loading

roji Apr 1, 2025 •

edited

Loading