Skip to content

Slack Federated Search v0 #4962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed

Slack Federated Search v0 #4962

wants to merge 14 commits into from

Conversation

Orbital-Web
Copy link
Contributor

@Orbital-Web Orbital-Web commented Jun 28, 2025

Description

Current apporach:

  1. Call slack_retrieval in parallel with doc_index_retrieval to get slack documents
  2. Process the slack response into InferenceChunks
  3. Always keep the first NUM_FEDERATED_SECTIONS sections returned from federated search. The sections are sorted by score so the slack documents will always appear at the end as they have 0 score (unless section_relevance_list is provided, in which case sections marked not relevant will appear even lower)

Notes:

  • If Slack is the only selected source, the normal search will be skipped
  • If slack is not part of the selected source filter, the slack search will be skipped
  • From filters are applied to slack searches
  • Slack search is keyword based, so we probably need to do more aggressive query keyword extraction to get results (we typically get 0 results for complex queries)

How Has This Been Tested?

Locally, could write test cases

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Copy link

vercel bot commented Jun 28, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 4, 2025 11:36pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Introduces federated search capabilities by integrating Slack messages alongside document index search, running keyword-based Slack retrieval in parallel with main document search.

  • Added new federated search module in /onyx/context/search/federated/ implementing Slack message search with parallel execution and source filtering
  • Added Pydantic models in models.py for structured handling of Slack messages and elements
  • Modified search_runner.py to coordinate parallel execution of document and federated searches with source-based filtering
  • Slack search uses basic keyword matching which may limit effectiveness on complex queries
  • Zero scoring of Slack results means they'll always appear after regular search results unless explicitly marked relevant

6 files reviewed, 3 comments
Edit PR Review Bot Settings | Greptile

@@ -115,34 +114,6 @@ def combine_retrieval_results(
return sorted_chunks


def get_query_embedding(query: str, db_session: Session) -> Embedding:
Copy link
Contributor Author

@Orbital-Web Orbital-Web Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it into utils as I might play around with embedding the query and chunks in slack_search.py to score the chunks (if it's here, it'll lead to circular imports).

@Orbital-Web
Copy link
Contributor Author

Merged as part of #4969 (comment)

@Orbital-Web Orbital-Web closed this Jul 9, 2025
@Orbital-Web Orbital-Web deleted the slack-federated-search branch July 9, 2025 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant