get_num_tokens_from_messages broken for Multi-modal messages #879

Jflick58 · 2025-04-22T04:32:09Z

Similar to #491

When trying to use get_num_tokens_from_messages with a ChatVertexAI model and multi-modal inputs, the token count from the Langchain method is wildly inflated (1369082) vs the GenAI SDK and Vertex AI Console token number (3358).

I believe this is due to the use of the inherited get_num_tokens_from_messages method that BaseChatModel inherits from BaseModel. This method uses get_buffer_string to convert everything in a message to a string https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/messages/utils.py#L82

Unfortunately, that means the entire base64 string gets counted vs the image_bytes that the GenAI SDK (and presumably, Vertex AI console) uses.

This makes it very difficult to track token usage and debug token limit exceeded errors.

Here is an example, with a sample image to use.

import os
import sys
import base64
import logging
from google import genai
from google.genai.types import HttpOptions, Part
from langchain_google_vertexai import ChatVertexAI
from langchain_core.messages import HumanMessage, SystemMessage

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def get_file_size_info(file_path):
    """Get file size info for an image file."""
    file_size = os.path.getsize(file_path)
    logging.info(f"Original file size: {file_size / 1024:.2f} KB")
    return file_size

def encode_image_base64(file_path):
    """Encode an image as base64."""
    with open(file_path, "rb") as f:
        image_bytes = f.read()
        encoded = base64.b64encode(image_bytes).decode()
        logging.info(f"Base64 encoded size: {len(encoded) / 1024:.2f} KB")
        logging.info(f"Base64 encoded length: {len(encoded)} characters")
        return encoded

def test_langchain_method(file_path):
    """Test how LangChain handles images and count tokens."""
    logging.info("\n===== Testing LangChain Method =====")
    
    # Initialize LangChain model
    llm = ChatVertexAI(model="gemini-2.0-flash-001")
    
    # Encode image
    encoded_image = encode_image_base64(file_path)
    
    # Create messages
    messages = [
        SystemMessage(content="You are a helpful assistant."),
        HumanMessage(
            content=[
                {
                    "type": "text",
                    "text": "Please analyze this image:"
                },
                {
                    "type": "image",
                    "source_type": "base64",
                    "data": encoded_image,
                    "mime_type": "image/png",
                }
            ]
        )
    ]
    
    # Count tokens
    try:
        token_count = llm.get_num_tokens_from_messages(messages)
        logging.info(f"LangChain token count: {token_count}")
    except Exception as e:
        logging.error(f"Error counting tokens: {str(e)}")
    
    return token_count

def test_direct_genai_method(file_path):
    """Test how direct Google Generative AI handles images using the Gemini API."""
    logging.info("\n===== Testing Direct GenAI Method =====")
    
    client = genai.Client(http_options=HttpOptions(api_version="v1"))

    contents = [
    Part.from_bytes(
      data=encode_image_base64(file_path),
      mime_type="image/png",
        ),
        "Please analyze this image",
    ]

    response = client.models.count_tokens(
        model="gemini-2.0-flash-001",
        contents=contents,
    )
    return response.total_tokens

def main():
    if len(sys.argv) < 2:
        print("Usage: python img_token_test.py <image_path>")
        sys.exit(1)
    
    file_path = sys.argv[1]
    if not os.path.exists(file_path):
        print(f"File not found: {file_path}")
        sys.exit(1)
    
    # Get file size info
    get_file_size_info(file_path)
    
    # Test both methods
    langchain_tokens = test_langchain_method(file_path)
    direct_tokens = test_direct_genai_method(file_path)
    
    # Print summary
    logging.info("\n===== Summary =====")
    logging.info(f"Image size: {os.path.getsize(file_path) / 1024:.2f} KB")
    logging.info(f"LangChain token count: {langchain_tokens}")
    
    if direct_tokens:
        logging.info(f"Direct GenAI token count: {direct_tokens}")
        logging.info(f"Difference: {abs(langchain_tokens - direct_tokens)} tokens")
        logging.info(f"Ratio between methods: {langchain_tokens / direct_tokens if direct_tokens else 'N/A'}")
    
    logging.info(f"Token ratio: {langchain_tokens / (os.path.getsize(file_path) / 1024):.2f} tokens per KB")

if __name__ == "__main__":
    main()

You'll need the following .env to run the example:

GOOGLE_CLOUD_PROJECT="project"
GOOGLE_CLOUD_LOCATION="us-central1"
GOOGLE_GENAI_USE_VERTEXAI=True

To run: uv run --env-file=.env img_token_test.py buddy-photo-pd61clsCVnY-unsplash.jpg

Notating this issue here - if I have time I'll open a PR to fix but figured I would open it in case someone else picks it up sooner. Seems like we need to implement an overloaded version of get_num_tokens_from_messages specific to ChatVertexAI

The text was updated successfully, but these errors were encountered:

lkuligin · 2025-05-07T20:19:54Z

yes, we need to overload since out-of-the-box LC buffers everything as a string:
https://github.com/langchain-ai/langchain/blob/d7e016c5fc71c8d284db827bac9a145f2171de04/libs/core/langchain_core/language_models/base.py#L393C40-L393C57

lkuligin added the enhancement New feature or request label May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_num_tokens_from_messages broken for Multi-modal messages #879

get_num_tokens_from_messages broken for Multi-modal messages #879

Jflick58 commented Apr 22, 2025 •

edited

Loading

lkuligin commented May 7, 2025

get_num_tokens_from_messages broken for Multi-modal messages #879

get_num_tokens_from_messages broken for Multi-modal messages #879

Comments

Jflick58 commented Apr 22, 2025 • edited Loading

lkuligin commented May 7, 2025

Jflick58 commented Apr 22, 2025 •

edited

Loading