Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: restructed data folder and multiple improvements to vector stores #11302

Merged
merged 3 commits into from
Apr 1, 2025

Conversation

eavanvalkenburg
Copy link
Member

@eavanvalkenburg eavanvalkenburg commented Apr 1, 2025

Motivation and Context

This PR:

Description

Most of the code has been moved out of many files and folders, into a few files which make purpose and collaboration between the classes easier to understand.

This is a breaking change:
If you have implemented your own VectorStoreRecordCollection, make sure to:

  • Add VectorStoreRecordCollection[TKey, TModel] as superclass to the implementation
  • Remove VectorSearchBase as superclass from implementation
  • Set the SearchMixins to [TKey, TModel] instead of just [TModel]

Contribution Checklist

@eavanvalkenburg eavanvalkenburg added the PR: breaking change Pull requests that introduce breaking changes label Apr 1, 2025
@eavanvalkenburg eavanvalkenburg requested a review from a team as a code owner April 1, 2025 08:50
@markwallace-microsoft markwallace-microsoft added python Pull requests for the Python Semantic Kernel memory labels Apr 1, 2025
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Apr 1, 2025

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
semantic_kernel/connectors/memory
   faiss.py117794%88, 96, 100, 128–129, 156, 214
semantic_kernel/connectors/memory/azure_ai_search
   azure_ai_search_collection.py1323673%171, 173, 246–287, 297–307, 311, 315, 320–323
   azure_ai_search_store.py42295%129–130
   utils.py72790%125, 127, 129, 150–151, 154–155
semantic_kernel/connectors/memory/azure_cosmos_db
   azure_cosmos_db_mongodb_collection.py936332%77–110, 138–139, 142–185, 194–218, 229–249
   azure_cosmos_db_mongodb_store.py361656%56–85, 107–116
   azure_cosmos_db_no_sql_collection.py1636759%97–98, 139–146, 157–171, 177–185, 191–195, 218–242, 246, 250, 274, 302–303, 307–312
   azure_cosmos_db_no_sql_store.py32681%90–95
   utils.py54787%43, 66, 91, 186, 200–203
semantic_kernel/connectors/memory/chroma
   chroma.py1984378%67–71, 84–85, 93–94, 110, 118, 126, 133–136, 143, 206–208, 226–229, 241, 257, 260–281, 304, 353–354, 356
semantic_kernel/connectors/memory/in_memory
   in_memory_collection.py1301588%66, 118, 120, 140, 179, 194, 206, 224–227, 231, 233, 235–236
semantic_kernel/connectors/memory/mongodb_atlas
   mongodb_atlas_collection.py1304268%143, 150, 183–185, 191–193, 200, 204, 216–217, 225–232, 242, 253–255, 263–288, 295–301, 305, 309, 319–320
   mongodb_atlas_store.py52787%89–90, 92, 139–140, 144–145
   utils.py301067%38–40, 62, 86–88, 106–113
semantic_kernel/connectors/memory/pinecone
   _pinecone.py2804385%116–119, 124, 129, 154, 160, 167, 200, 204–205, 213, 218–219, 238, 244–248, 274, 302–305, 322, 364, 366, 371, 378, 387, 389, 392, 401, 403, 407, 422, 424, 429, 435, 440, 451, 499, 518, 583
semantic_kernel/connectors/memory/postgres
   postgres_collection.py2274481%130, 138–140, 145–149, 157, 162, 182, 235, 252, 264, 314, 323, 373, 393, 413, 422, 430, 465, 471–474, 493, 520, 522–524, 547, 605–625, 629, 633
   utils.py581279%72, 116, 119–126, 158, 173–174
semantic_kernel/connectors/memory/redis
   redis_collection.py2374780%179, 184–185, 196–206, 214–234, 237–244, 259–264, 268, 344, 381–388, 466, 488–495
   redis_store.py42295%108–109
   utils.py693254%152–153, 171, 173, 180–195, 202–232
semantic_kernel/connectors/memory/weaviate
   utils.py782272%91–96, 193, 242, 261, 268–271, 280–291
   weaviate_collection.py1705965%171–174, 178–184, 188–189, 200–217, 224–231, 240–260, 269–285, 289, 292–296, 340, 344–351, 366, 385, 401, 410–411, 417, 422
   weaviate_store.py581771%109–117, 121–126, 131–136, 141–142
semantic_kernel/connectors/search/bing
   bing_search.py1091289%113, 119, 129, 152–153, 193, 199–200, 204–205, 211–212
semantic_kernel/connectors/search/google
   google_search.py981189%114, 124, 150–151, 166–167, 191–195
semantic_kernel/data
   text_search.py149497%188–189, 468–470
   vector_search.py1151785%204–205, 244–247, 269–271, 311–314, 334–336, 369–372, 392–394
   vector_storage.py252797%205–206, 325–326, 535, 593, 652
semantic_kernel/functions
   kernel_plugin.py180697%393, 396, 409, 434, 455, 480
TOTAL21002247588% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
3336 5 💤 0 ❌ 0 🔥 1m 37s ⏱️

@moonbox3 moonbox3 added this pull request to the merge queue Apr 1, 2025
Merged via the queue into microsoft:main with commit a52d47e Apr 1, 2025
28 checks passed
eavanvalkenburg added a commit to eavanvalkenburg/semantic-kernel that referenced this pull request Apr 2, 2025
…ores (microsoft#11302)

### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->
This PR:
- restructures the data folder, flattening the hierarchy
- it adds overloads to the Get, Upsert, Delete methods, to allow batch
as well, marks the batch versions as deprecated, closes microsoft#11301
- restructured the inheritance of VectorStoreRecordCollection, which now
has a base that does record handling, which is shared with the
VectorSearchBase for the serialization, and the search method mixins now
subclass the VectorSearchBase so that there is one less parent direct
parent, a full Collection will now inherit from
VectorStoreRecordCollection and zero or more VectorSearch mixins, this
is done to clean things up and in preparation of adding HybridSearch.
- Also adds a convenience method to each search mixin to create a
VectorStoreTextSearch directly, allowing folks to import one less
concept, closes microsoft#11111

### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->
Most of the code has been moved out of many files and folders, into a
few files which make purpose and collaboration between the classes
easier to understand.

This is a breaking change:
If you have implemented your own VectorStoreRecordCollection, make sure
to:

- Add `VectorStoreRecordCollection[TKey, TModel]` as superclass to the
implementation
- Remove `VectorSearchBase` as superclass from implementation
- Set the SearchMixins to `[TKey, TModel]` instead of just `[TModel]`

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄

---------

Co-authored-by: Evan Mattson <[email protected]>
glorious-beard pushed a commit to glorious-beard/semantic-kernel that referenced this pull request Apr 6, 2025
…ores (microsoft#11302)

### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->
This PR:
- restructures the data folder, flattening the hierarchy
- it adds overloads to the Get, Upsert, Delete methods, to allow batch
as well, marks the batch versions as deprecated, closes microsoft#11301
- restructured the inheritance of VectorStoreRecordCollection, which now
has a base that does record handling, which is shared with the
VectorSearchBase for the serialization, and the search method mixins now
subclass the VectorSearchBase so that there is one less parent direct
parent, a full Collection will now inherit from
VectorStoreRecordCollection and zero or more VectorSearch mixins, this
is done to clean things up and in preparation of adding HybridSearch.
- Also adds a convenience method to each search mixin to create a
VectorStoreTextSearch directly, allowing folks to import one less
concept, closes microsoft#11111

### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->
Most of the code has been moved out of many files and folders, into a
few files which make purpose and collaboration between the classes
easier to understand.

This is a breaking change:
If you have implemented your own VectorStoreRecordCollection, make sure
to:

- Add `VectorStoreRecordCollection[TKey, TModel]` as superclass to the
implementation
- Remove `VectorSearchBase` as superclass from implementation
- Set the SearchMixins to `[TKey, TModel]` instead of just `[TModel]`

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄

---------

Co-authored-by: Evan Mattson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory PR: breaking change Pull requests that introduce breaking changes python Pull requests for the Python Semantic Kernel
Projects
None yet
4 participants