-
Notifications
You must be signed in to change notification settings - Fork 16
Fix major bugs in serialization #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# NOTE: if the list has only one item, return just that item unless we are dealing with an AnswerBuilder | ||
if len(transformed) == 1: | ||
# NOTE: This is a workaround only for the AnswerBuilder component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is whats causing the issue with deserializing ChatMessages. The _content
is supposed to be a list but this step removes the list around it. I am checking how to fix this without causing other components to crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good catch! I think it's enough:
if isinstance(component, AnswerBuilder) or isinstance(component, ChatMessage):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess more instances will keep coming up where removing a list will raise an error. This hack will also keep getting longer. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the original type of value should be preserved, whether a str, int or list.
# we need to wrap the value in a list | ||
if isinstance(component, AnswerBuilder): | ||
return transformed | ||
if len(data) == 1 and isinstance(data[0], dict) and "value" in data[0] and "sender" in data[0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line + the return can run before the line above
if isinstance(value, GeneratedAnswer): # noqa: SIM102 | ||
if value.meta and "usage" in value.meta: # noqa: SIM102 | ||
value.meta.pop("usage", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are not calling serilization on component inputs, this needed to be changed for AnswerJoiner tests to pass.
* initial import from work done on haystack branch * adding missing Exception and code examples * adding missing Exception * David/pipeline testing (#254) * adding ByteStream example * serialising posixpath objects * fixing tests * fixing namespace import * fixing namespace import * adding examples to ruff check excludes * typo * test cases with loops * Add the last component's input to debug state * Revert "Add the last component's input to debug state" This reverts commit fb98b3d. * Add the last component's input to debug state (#256) * fix: fixing deserialisation issues - working version with Hybrid Pipeline tests (#257) * adding ByteStream example * serialising posixpath objects * fixing tests * fixing namespace import * fixing namespace import * adding examples to ruff check excludes * typo * adding verbosity to examples * fixing deserialisation issues * linting * Fix deserilzation errors in resuming (#258) * Fix deserilzation erros * Adding tests * adding tests * nit cleaning * adding debug_path to pipeline and tests for breakpoint/resume * working for all components in a Hybrid RAG * linting * linting * fixing tests * adding extra dependencies * fixing pyproject.toml * replacing tests model by a smaller one --------- Co-authored-by: David S. Batista <[email protected]> * fix: starting from clean state (#260) * wip: debugging more pipelines * wip: debugging more pipelines * adding more tests * cleaning * cleaning * cleaning * cleaning * cleaning * cleaning * typing * fix: removing all sender keys before serialisation (#261) * removing all sender keys * cleaning * cleaning * removing unused function * fixing header file issue * fix: hybrid pipeline working with stop/resume at each breakpoing (#262) * wip: debugging hybrids * wip: debugging hybrids * cleaning * ruffing * removing unused imports * Add a test file for loops * Fix major bugs in serialization (#266) * Fix issues with chat messages * Update the transform structure method to preserve data type of value * Fixed the issue with GeneratedAnswer * fix: `Adaptor` will need `unsafe=True` for the break/resume to work properly (#267) * adaptor needs unsafe=True + another extra list in a Joiner component * adding missing dependencies * triggering end2end tests for answers joiner pipeline * adding pipeline with list joiner tests * test: adding one more test with another Joiner (#268) * cleaning * marking tests as integration tests * nit documentation * fixing pipeline.py * PR comments * Remove passing component param to resume functions * Include load state in the resume process * Improve based on comments * Update haystack_experimental/core/pipeline/pipeline.py Co-authored-by: Sebastian Husch Lee <[email protected]> * only breaking at the exact visit number * moving imports to the top * Improvements * Improve deserialization * Fix linting * Add warning for unreached breakpoints * Fix typing issues * updating errors.py * Fix integration tests * skipping some tests under certain conditions --------- Co-authored-by: Amna Mubashar <[email protected]> Co-authored-by: Sebastian Husch Lee <[email protected]>
* initial import * adding base for easier review * Adding dependency files * feat: adding `Pipeline` checkpoints (#269) * initial import from work done on haystack branch * adding missing Exception and code examples * adding missing Exception * David/pipeline testing (#254) * adding ByteStream example * serialising posixpath objects * fixing tests * fixing namespace import * fixing namespace import * adding examples to ruff check excludes * typo * test cases with loops * Add the last component's input to debug state * Revert "Add the last component's input to debug state" This reverts commit fb98b3d. * Add the last component's input to debug state (#256) * fix: fixing deserialisation issues - working version with Hybrid Pipeline tests (#257) * adding ByteStream example * serialising posixpath objects * fixing tests * fixing namespace import * fixing namespace import * adding examples to ruff check excludes * typo * adding verbosity to examples * fixing deserialisation issues * linting * Fix deserilzation errors in resuming (#258) * Fix deserilzation erros * Adding tests * adding tests * nit cleaning * adding debug_path to pipeline and tests for breakpoint/resume * working for all components in a Hybrid RAG * linting * linting * fixing tests * adding extra dependencies * fixing pyproject.toml * replacing tests model by a smaller one --------- Co-authored-by: David S. Batista <[email protected]> * fix: starting from clean state (#260) * wip: debugging more pipelines * wip: debugging more pipelines * adding more tests * cleaning * cleaning * cleaning * cleaning * cleaning * cleaning * typing * fix: removing all sender keys before serialisation (#261) * removing all sender keys * cleaning * cleaning * removing unused function * fixing header file issue * fix: hybrid pipeline working with stop/resume at each breakpoing (#262) * wip: debugging hybrids * wip: debugging hybrids * cleaning * ruffing * removing unused imports * Add a test file for loops * Fix major bugs in serialization (#266) * Fix issues with chat messages * Update the transform structure method to preserve data type of value * Fixed the issue with GeneratedAnswer * fix: `Adaptor` will need `unsafe=True` for the break/resume to work properly (#267) * adaptor needs unsafe=True + another extra list in a Joiner component * adding missing dependencies * triggering end2end tests for answers joiner pipeline * adding pipeline with list joiner tests * test: adding one more test with another Joiner (#268) * cleaning * marking tests as integration tests * nit documentation * fixing pipeline.py * PR comments * Remove passing component param to resume functions * Include load state in the resume process * Improve based on comments * Update haystack_experimental/core/pipeline/pipeline.py Co-authored-by: Sebastian Husch Lee <[email protected]> * only breaking at the exact visit number * moving imports to the top * Improvements * Improve deserialization * Fix linting * Add warning for unreached breakpoints * Fix typing issues * updating errors.py * Fix integration tests * skipping some tests under certain conditions --------- Co-authored-by: Amna Mubashar <[email protected]> Co-authored-by: Sebastian Husch Lee <[email protected]> * adding tests * fixing tests * bumping up test coverage * simplifying deserliaise function * activating skip for macOS * fix: fixing serialisation bug and missing tests due to filename renaming (#274) * ading missing code to deserialise and safeguard for resume state in RAG hybrid * tests passing for test_pipeline_breakpoints_answer_joiner * tests passing for test_pipeline_breakpoints_branch_joiner * tests passing for test_pipeline_breakpoints_list_joiner * tests passing for test_pipeline_breakpoints_loop * tests passing for test_pipeline_string_joiner * cleaning * skipping macOS tests for RAG hybrid * adding missed import * fix: windows tests fails due to not being able to read resume state files (#275) * initial import * fixing tests * temporary removing delay and rerun from integration tests * temporary removing delay and rerun from integration tests * fixing windows tests * cleaning * reverting delay and rerun from integration tests * fixing tests * fix: revert the passing of the resume state (#276) * cleaning printf leftovers * updating README.MD * fix: fixing some `pipeline.py` and `errors.py` inconsistencies and respective tests (#277) * initial import * cleaning init * fixing errors.py duplication * fixing linting * fixing pipeline run time error * fixing pipeline run time error * fixing pipeline run time tests * fix: adding mocks to tests (#278) * OpeanAI calls mocked * cleaning imports * adding a mocked TransformersSimilarityRanker * adding a mocked SentenceDocumentEmbedder * mocking rag hybrid pipeline components that make external calls * reverting skip on macOS due to pytorch/sentence-transformers lib * reverting skip on macOS due to pytorch/sentence-transformers lib * mocking all OpenAIChatGenerator calls in the pipeline breakpoint tests * reverting skip on macOS due to pytorch/sentence-transformers lib * relying only on Pathlib for f_name * fixing tests * fixing all mocks for Hybrid pipeline * cleaning imports * cleaning imports * reverting skip on macOS due to pytorch/sentence-transformers lib * linting * pinning transformers to same versions as in haystack-ai * pinning transformers to same versions as in haystack-ai * removing skipping macOS tests for RAG hybrid since it's all mocked now --------- Co-authored-by: Amna Mubashar <[email protected]> Co-authored-by: Sebastian Husch Lee <[email protected]>
Chat messages are not serialized properly.