Description
I have already built a RAG with the following langchain tutorial and it works well with Gemini Flash 2.0:
https://python.langchain.com/docs/tutorials/rag/
Now, since 2.5 Flash was launched, I changed the model name to 2.5 Flash, but return an empty result like the following:
content='' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'MAX_TOKENS', 'model_name': 'gemini-2.5-flash', 'safety_ratings': []} id='xxxxxxxxxxxxxxxxxxx' usage_metadata={'input_tokens': 39236, 'output_tokens': 0, 'total_tokens': 42307, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 3071}}
2.5 Flash works well when you ask the question directly without RAG:
llm = ChatGoogleGenerativeAI(model='gemini-2.5-flash',safety_settings={
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
},api_key=google_api_key,temperature = 0.3,top_p = 0.7,max_output_tokens=3072, timeout=40,max_retries=2)
def generate(state: State):
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
messages = prompt.invoke({"input": state["question"], "context": docs_content})
try:
start_time = time.time()
response = llm.invoke(messages)
print(1)
print(messages)
print(2)
print(response) #return empty text
print(3)
end_time = time.time()
elapsed_time = end_time - start_time
print(elapsed_time)
print(llm.invoke('explain yourself in 500 words')) #return text
except Exception as e:
print(f"Error: {e}")
return {"answer": response.content}
Other info:
python 3.10
langchain 0.3.26
langchain-community 0.3.26
langchain-core 0.3.68
langchain-google-genai 2.1.6
google-ai-generativelanguage 0.6.18
langgraph 0.5.0