Skip to content

chore(core) Document: add test checking pageContents empty string behavior #2836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

skarard
Copy link
Contributor

@skarard skarard commented Oct 8, 2023

This PR is related to a discussion in #2807, the current behaviour of the Document constructor field.pageContent when given an empty string is to set pageContent as undefined. This broke HNSWLib.addDocuments() as it attempts a string.replace() on undefined rather than the typed string expected.

I suggested allowing pageContents to be set as an empty string. @jacoblee93 suggested that a Document with an empty pageContents should not be created at all.

This implementation requires pageContents to be a string that is not empty, otherwise it throws Error("pageContents must not be empty").

@vercel
Copy link

vercel bot commented Oct 8, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-docs ✅ Ready (Inspect) Visit Preview Apr 28, 2025 11:32am
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ⬜️ Ignored (Inspect) Apr 28, 2025 11:32am

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Oct 8, 2023
@skarard skarard changed the title Explicitly check that pageContents is not an empty string. Document: Explicitly check that pageContents is not an empty string. Oct 8, 2023
@jacoblee93
Copy link
Collaborator

Ideally we shouldn't need this with TypeScript, but an empty string should be permissible, just not a hard undefined, null, or non-string value - can you change?

@skarard
Copy link
Contributor Author

skarard commented Oct 9, 2023 via email

@skarard
Copy link
Contributor Author

skarard commented Oct 26, 2023

@jacoblee93 I added the proposed change and threw in a couple of unit tests for good measure.

Copy link
Member

@bracesproul bracesproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @skarard sorry for taking so long to review, these changes look good! We've moved document.ts into langchain-core/src/documents/document.ts, could you push up a change moving your changes into that updated file then re-request my review?

Thanks again!

…cument-constructor-throw-on-empty-pageContent
@benjamincburns benjamincburns changed the title Document: Explicitly check that pageContents is not an empty string. chore(core) Document: add test checking pageContents empty string behavior Apr 28, 2025
@benjamincburns
Copy link
Collaborator

Thanks for the contribution!

@dosubot dosubot bot added the lgtm PRs that are ready to be merged as-is label Apr 28, 2025
@benjamincburns benjamincburns enabled auto-merge (squash) April 28, 2025 11:20
@benjamincburns benjamincburns disabled auto-merge April 28, 2025 11:38
@benjamincburns benjamincburns merged commit deab3fe into langchain-ai:main Apr 28, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature lgtm PRs that are ready to be merged as-is size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants