Skip to content

feat: add serialization to State / move State to utils #9345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Amnah199
Copy link
Contributor

@Amnah199 Amnah199 commented May 5, 2025

Related Issues

Proposed Changes:

  • Introduce serialize_value and deserialize_value utility methods in the utils module. These methods encapsulate logic that is also used in the breakpoints de/serialization logic and tracing.utils.coerce_tag_value (except the json load). Once the breakpoints feature is merged in haystack, it can reuse these centralized utility functions.
  • Add serialization and deserialization logic to the State class.
  • Move the State class to the utils module, as it is not actually a data class. A deprecation warning is added to the existing State class in the dataclasses module to guide users toward the updated implementation.

How did you test it?

Moved the existing tests to test_utils_state.py and added two new tests for serialization and deserialization.

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels May 5, 2025
@Amnah199 Amnah199 marked this pull request as ready for review May 6, 2025 11:41
@Amnah199 Amnah199 requested review from a team as code owners May 6, 2025 11:41
@Amnah199 Amnah199 requested review from dfokina and julian-risch and removed request for a team May 6, 2025 11:41
@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 14859791142

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 40 unchanged lines in 6 files lost coverage.
  • Overall coverage increased (+0.005%) to 90.415%

Files with Coverage Reduction New Missed Lines %
core/pipeline/utils.py 1 98.11%
utils/base_serialization.py 3 94.12%
utils/init.py 4 31.58%
components/tools/tool_invoker.py 7 90.85%
tools/component_tool.py 8 93.2%
core/pipeline/base.py 17 93.24%
Totals Coverage Status
Change from base Build 14838765775: 0.005%
Covered Lines: 10961
Relevant Lines: 12123

💛 - Coveralls

@Amnah199
Copy link
Contributor Author

Amnah199 commented May 6, 2025

I’ve kept the current haystack.dataclasses.State file for backward compatibility. However, the tests from test.dataclasses.test_state.py have been moved to utils, so we could consider removing this test file. Alternatively, we might keep it temporarily for safety until the old State class is fully removed.

Comment on lines +361 to +424
def test_state_to_dict():
# we test dict, a python type and a haystack dataclass
state_schema = {"numbers": {"type": int}, "messages": {"type": List[ChatMessage]}, "dict_of_lists": {"type": dict}}

data = {
"numbers": 1,
"messages": [ChatMessage.from_user(text="Hello, world!")],
"dict_of_lists": {"numbers": [1, 2, 3]},
}
state = State(state_schema, data)
state_dict = state.to_dict()
assert state_dict["schema"] == {
"numbers": {"type": "int", "handler": "haystack.utils.state_utils.replace_values"},
"messages": {
"type": "typing.List[haystack.dataclasses.chat_message.ChatMessage]",
"handler": "haystack.utils.state_utils.merge_lists",
},
"dict_of_lists": {"type": "dict", "handler": "haystack.utils.state_utils.replace_values"},
}
assert state_dict["data"] == {
"numbers": 1,
"messages": [
{"role": "user", "meta": {}, "name": None, "content": [{"text": "Hello, world!"}], "_type": "ChatMessage"}
],
"dict_of_lists": {"numbers": [1, 2, 3]},
}


def test_state_from_dict():
state_dict = {
"schema": {
"numbers": {"type": "int", "handler": "haystack.utils.state_utils.replace_values"},
"messages": {
"type": "typing.List[haystack.dataclasses.chat_message.ChatMessage]",
"handler": "haystack.utils.state_utils.merge_lists",
},
"dict_of_lists": {"type": "dict", "handler": "haystack.utils.state_utils.replace_values"},
},
"data": {
"numbers": 1,
"messages": [
{
"role": "user",
"meta": {},
"name": None,
"content": [{"text": "Hello, world!"}],
"_type": "ChatMessage",
}
],
"dict_of_lists": {"numbers": [1, 2, 3]},
},
}
state = State.from_dict(state_dict)
# Check types are correctly converted
assert state.schema["numbers"]["type"] == int
assert state.schema["dict_of_lists"]["type"] == dict
# Check handlers are functions, not comparing exact functions as they might be different references
assert callable(state.schema["numbers"]["handler"])
assert callable(state.schema["messages"]["handler"])
assert callable(state.schema["dict_of_lists"]["handler"])
# Check data is correct
assert state.data["numbers"] == 1
assert state.data["messages"] == [ChatMessage.from_user(text="Hello, world!")]
assert state.data["dict_of_lists"] == {"numbers": [1, 2, 3]}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ease in review, these are the two new tests for serialization. Rest are just moved from old test file.

Comment on lines +126 to +133
_type_deserializers = {
"Answer": Answer.from_dict,
"ChatMessage": ChatMessage.from_dict,
"Document": Document.from_dict,
"ExtractedAnswer": ExtractedAnswer.from_dict,
"GeneratedAnswer": GeneratedAnswer.from_dict,
"SparseEmbedding": SparseEmbedding.from_dict,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a predefined set of deserializers couldn't we follow a similar methodology like we do for component_from_dict where we import the _type field, check if the imported class has a from_dict attribute and if so use that? That way we wouldn't need to create a list of hard-coded deserializers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add serialization/deserialization support to State
3 participants