UnicodeDecodeError when decoding message body with non-UTF-8 encoding

Hi team,

First, thank you for your work on this integration — it's been very helpful.

I'm encountering a `UnicodeDecodeError` when using the `GmailSearch` tool to retrieve and parse messages. The error occurs in the `_parse_messages` method when the message body is decoded as UTF-8, but the actual encoding of the message is different (e.g., Latin-1 or Windows-1252). Here's the traceback:

```
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 13503: invalid start byte
```

The issue seems to happen at this line (from `gmail/search.py`):
```
message_body = email_msg.get_payload(decode=True).decode("utf-8")
```
While multipart messages have a fallback to Latin-1 decoding, non-multipart messages are always decoded as UTF-8, which can cause the tool to crash when the content is in a different encoding.

To improve robustness, I recommend wrapping the decoding step with a fallback,  more general solution could try multiple encodings or use `errors="replace"` or `errors="ignore"` to avoid hard crashes on malformed characters.

Let me know if you'd like me to open a PR with a patch.

Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UnicodeDecodeError when decoding message body with non-UTF-8 encoding #1030

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UnicodeDecodeError when decoding message body with non-UTF-8 encoding #1030

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions