Description
Hi team,
First, thank you for your work on this integration — it's been very helpful.
I'm encountering a UnicodeDecodeError
when using the GmailSearch
tool to retrieve and parse messages. The error occurs in the _parse_messages
method when the message body is decoded as UTF-8, but the actual encoding of the message is different (e.g., Latin-1 or Windows-1252). Here's the traceback:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 13503: invalid start byte
The issue seems to happen at this line (from gmail/search.py
):
message_body = email_msg.get_payload(decode=True).decode("utf-8")
While multipart messages have a fallback to Latin-1 decoding, non-multipart messages are always decoded as UTF-8, which can cause the tool to crash when the content is in a different encoding.
To improve robustness, I recommend wrapping the decoding step with a fallback, more general solution could try multiple encodings or use errors="replace"
or errors="ignore"
to avoid hard crashes on malformed characters.
Let me know if you'd like me to open a PR with a patch.
Thanks again!