Skip to content

Add serialization/deserialization support to State #9286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sjrl opened this issue Apr 22, 2025 · 3 comments · May be fixed by #9345
Open

Add serialization/deserialization support to State #9286

sjrl opened this issue Apr 22, 2025 · 3 comments · May be fixed by #9345
Assignees
Labels
P1 High priority, add to the next sprint

Comments

@sjrl
Copy link
Contributor

sjrl commented Apr 22, 2025

When State was originally added (code here) we did not add serialization/deserialization support for it since it was not immediately needed.

However, for Pipeline Checkpoints and better traces (see deepset-ai/haystack-core-integrations#1664) we should add support for serialization and deserialization.

I believe basing the to_dict off of something like our coerce_tag_value could work or taking inspiration from the Pipeline breakpoints PR deepset-ai/haystack-experimental#271 and how they handled serialization/deserialization of inputs and outputs of components could help as well.

@julian-risch julian-risch added the P1 High priority, add to the next sprint label Apr 24, 2025
@YassinNouh21
Copy link
Contributor

Hi @sjrl ,

I’ve put together a small “playground” demo that reproduces the missing serialization on State and then shows how to fix it:

  1. Reproduction

    • Define a toy schema (including messages, a string field, a list field, and a custom‐handler field).
    • Instantiate State, call .set() and .get(), then attempt state.to_dict() / State.from_dict() and observe the AttributeError.
  2. Proposed implementation

    • to_dict on State:
      • Serialize the schema (using serialize_type + serialize_callable).
      • Serialize the live data, converting any ChatMessage objects to their dict form.
      • Wrap it all in our standard default_to_dict shape.
    • from_dict on State:
      • Rebuild the schema (via deserialize_type + deserialize_callable).
      • Reconstruct the data (including deserializing ChatMessage.from_dict for message entries).
      • Call the State(schema, data) constructor to restore full parity.

Let me know if this captures the right direction—I’ll package it up into a formal PR once we agree on the approach.

@LastRemote
Copy link
Contributor

LastRemote commented Apr 25, 2025

Not sure if this is related, but it seems like State is not a dataclass although it resides in haystack.dataclasses. Is this the intended behavior?

@YassinNouh21
Copy link
Contributor

YassinNouh21 commented Apr 25, 2025

@LastRemote

  1. I've added two new methods on State in haystack/dataclasses/state.py:
    to_dict that calls _schema_to_dict, deep-copies _data, and serializes any nested .to_dict() objects (including lists).
    from_dict that reverses it, using _schema_from_dict to rebuild the schema and feeding the raw data back into State(schema, data).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority, add to the next sprint
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants