Skip to content

Document.pageContent should not allow undefined #5884

Closed
@glorat

Description

@glorat

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { describe, it, expect } from 'vitest'
import { Document } from '@langchain/core/documents'

describe('langchain Type Compatibility Test', () => {
  it('Document should handle empty pageContent', () => {
    const doc = new Document({ pageContent: '' })
    expect(doc.pageContent).toStrictEqual('')
  })
})

Error Message and Stack Trace (if applicable)

AssertionError: expected undefined to strictly equal ''
Expected :
Actual :undefined

Description

pageContent should be set to '', as mandated by its type contract string (and not string|undefined)

The bug is visible in the code
https://github.com/langchain-ai/langchainjs/blob/main/langchain-core/src/documents/document.ts

export class Document<
  // eslint-disable-next-line @typescript-eslint/no-explicit-any
  Metadata extends Record<string, any> = Record<string, any>
> implements DocumentInput, DocumentInterface
{
  pageContent: string;

  metadata: Metadata;

  constructor(fields: DocumentInput<Metadata>) {
    this.pageContent = fields.pageContent
      ? fields.pageContent.toString()
      : this.pageContent;
    this.metadata = fields.metadata ?? ({} as Metadata);
  }
}

Line 32 should check field.pageContent is undefined, not just truthy, otherwise it uses this.pageContent which is undefined and breaks the contract that pageContent must be a string. Thus in any event, pageContent needs to be initialised to something to fulfill the type contract

System Info

This will happen on any environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    auto:bugRelated to a bug, vulnerability, unexpected error with an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions