Skip to content

Multimodal support for pdf documents #1014

Open
@starkdg

Description

@starkdg

The multimodal support for images appears to work fine. However, adding pdf files results in an error: "Invalid argument provided to Gemini: 400 The document has no pages." (The pdf file is fine. I tried multiple pdf's and they all generate this error message.)

The langchain-genai docs only explain how to pass multimodal data in the case of images. But the how-to from the langchain docs describe the following way: https://python.langchain.com/docs/how_to/multimodal_inputs/#documents-pdf I get the error using both langchain's howto and the code below that uses the langchain-google module.

My question is is this a feature that is just not implemented yet? Or is there some design flaw that is prohibitive? Does anyone know of a work around? Or am I just not doing it right?

The google gemini site at ai.google.dev appears to use the google.genai.Part type in the content parameter, but I don't know how that relates to the langchain-genai module.

Gemini models are equipped to handle very large pdf's, so I would think this would be a valuable feature.

Thanks in advance for any pointers.

import base64, httpx
from PyPDF2 import PdfReader
from dotenv import load_dotenv
from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

load_dotenv()

pdf_file = "test.pdf"
llm = ChatGoogleGenerativeAI(model="models/gemini-2.0-flash")


reader = PdfReader(pdf_file)
page = reader.pages[0]
pdf_text = page.extract_text()
pdf_data = base64.b64encode(pdf_text.encode('utf-8')).decode('utf-8')

message = HumanMessage(
    content=[
        {"type": "text", "text": "Describe this document."},
        {"type": "file", "source_type": "base64", "mime_type":"application/pdf", "data": pdf_data}
    ]
)

response = llm.invoke([message])
print(response.content)
Traceback (most recent call last):
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/langchain_google_genai/chat_models.py", line 192, in _chat_with_retry
    return generation_method(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/google/ai/generativelanguage_v1beta/services/generative_service/client.py", line 868, in generate_content
    response = rpc(
               ^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 294, in retry_wrapped_func
    return retry_target(
           ^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 156, in retry_target
    next_sleep = _retry_error_helper(
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/google/api_core/retry/retry_base.py", line 214, in _retry_error_helper
    raise final_exc from source_exc
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 147, in retry_target
    result = target()
             ^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/google/api_core/timeout.py", line 130, in func_with_timeout
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InvalidArgument: 400 The document has no pages.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/Projects/ai_agent_project/test_multimodal.py", line 25, in <module>
    response = llm.invoke([message])
               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/langchain_google_genai/chat_models.py", line 1255, in invoke
    return super().invoke(input, config, stop=stop, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 372, in invoke
    self.generate_prompt(
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 957, in generate_prompt
    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 776, in generate
    self._generate_with_cache(
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 1022, in _generate_with_cache
    result = self._generate(
             ^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/langchain_google_genai/chat_models.py", line 1342, in _generate
    response: GenerateContentResponse = _chat_with_retry(
                                        ^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/langchain_google_genai/chat_models.py", line 210, in _chat_with_retry
    return _chat_with_retry(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                     ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/david/.local/share/venvs/venv-langchain/lib/python3.11/site-packages/langchain_google_genai/chat_models.py", line 204, in _chat_with_retry
    raise ChatGoogleGenerativeAIError(
langchain_google_genai.chat_models.ChatGoogleGenerativeAIError: Invalid argument provided to Gemini: 400 The document has no pages.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions