_original_id should not be required in weaviate with Haystack 2.6

### Discussed in https://github.com/deepset-ai/haystack/discussions/8513

<div type='discussions-op-text'>

<sup>Originally posted by **bwbw723** November  1, 2024</sup>
I am using the WeaviateEmbeddingRetriever to work with the data.
It works fine with the default class in weaviate.
Once I change it to the data class created by myself with customized schema, I got the issue as below:
```log
  File "/root/TS_ph3/00_WeaviateEmbeddingRetriever.py", line 70, in <module>
    result = query_pipeline.run({"text_embedder": {"text": query}})
  File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack/core/pipeline/pipeline.py", line 229, in run
    res: Dict[str, Any] = self._run_component(name, components_inputs[name])
  File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack/core/pipeline/pipeline.py", line 67, in _run_component
    res: Dict[str, Any] = instance.run(**inputs)
  File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack_integrations/components/retrievers/weaviate/embedding_retriever.py", line 138, in run
    documents = self._document_store._embedding_retrieval(
  File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack_integrations/document_stores/weaviate/document_store.py", line 538, in _embedding_retrieval
    return [self._to_document(doc) for doc in result.objects]
  File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack_integrations/document_stores/weaviate/document_store.py", line 538, in <listcomp>
    return [self._to_document(doc) for doc in result.objects]
  File "/root/.cache/pypoetry/virtualenvs/search-infra-7HLB3Aeo-py3.10/lib/python3.10/site-packages/haystack_integrations/document_stores/weaviate/document_store.py", line 306, in _to_document
    document_data["id"] = document_data.pop("_original_id")
KeyError: '_original_id'
```

I check the codes and find that the predefined function need to get data of _original_id and set it as the Document ID.
I have updated the codes in document_store.py and set set document_data["id"] as generated UUID if the dataset does not have one.
In this case, the expected results are shown.
I do not think that the data in weaviate is forced to have the column as _original_id .
But based on the current codes, it will return errors if no _original_id there.
I prefer to have a if statement to handle the different cases.
Please kindly correct me if any misunderstandings.

The packages I am using are:
haystack-ai = "2.6.1"
fastembed-haystack = "1.3.0"
weaviate-client = "^4.9.0"
weaviate-haystack = "^4.0.0"

```py
    def _to_document(self, data: DataObject[Dict[str, Any], None]) -> Document:
        """
        Converts a data object read from Weaviate into a Document.
        """
        document_data = data.properties
        # The error is raised here and I just set document_data["id"] as generated UUID if the dataset does not have one.
        document_data["id"] = document_data.pop("_original_id") 
        if isinstance(data.vector, List):
            document_data["embedding"] = data.vector
        elif isinstance(data.vector, Dict):
            document_data["embedding"] = data.vector.get("default")
        else:
            document_data["embedding"] = None

        if (blob_data := document_data.get("blob_data")) is not None:
            document_data["blob"] = {
                "data": base64.b64decode(blob_data),
                "mime_type": document_data.get("blob_mime_type"),
            }

        # We always delete these fields as they're not part of the Document dataclass
```</div>



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

_original_id should not be required in weaviate with Haystack 2.6 #8523

Discussed in #8513

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

_original_id should not be required in weaviate with Haystack 2.6 #8523

Description

Discussed in #8513

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions