import segments from cvs:can't adapt type 'dict' #12717

leoterry-ulrica · 2025-01-14T05:09:45Z

Self Checks

This is only for bug report, if you would like to ask a question, please head to Discussions.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Dify version

0.15.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

In QA mode, an error occurred when importing segment information in bulk by selecting a CSV file:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/tasks/batch_create_segment_to_index_task.py", line 71, in batch_create_segment_to_index_task
    .scalar()
     ^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2805, in scalar
    ret = self.one()
          ^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2778, in one
    return self._iter().one()  # type: ignore
           ^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2827, in _iter
    result: Union[ScalarResult[_T], Result[_T]] = self.session.execute(
                                                  ^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2362, in execute
    return self._execute_internal(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2226, in _execute_internal
    ) = compile_state_cls.orm_pre_session_exec(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/context.py", line 561, in orm_pre_session_exec
    session._autoflush()
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 3061, in _autoflush
    raise e.with_traceback(sys.exc_info()[2])
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 3050, in _autoflush
    self.flush()
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4352, in flush
    self._flush(objects)
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4487, in _flush
    with util.safe_reraise():
         ^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 4448, in _flush
    flush_context.execute()
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 466, in execute
    rec.execute(self)
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py", line 642, in execute
    util.preloaded.orm_persistence.save_obj(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 93, in save_obj
    _emit_insert_statements(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 1233, in _emit_insert_statements
    result = connection.execute(
             ^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
    return meth(
           ^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
    return connection._execute_clauseelement(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
    ret = self._execute_context(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
    return self._exec_single_context(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
    self._handle_dbapi_exception(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2355, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
    self.dialect.do_execute(
  File "/app/api/.venv/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.ProgrammingError) can't adapt type 'dict'

SQL:

[SQL: INSERT INTO document_segments (tenant_id, dataset_id, document_id, position, content, answer, word_count, tokens, index_node_id, index_node_hash, hit_count, disabled_at, disabled_by, status, created_by, updated_by, indexing_at, completed_at, error, stopped_at) VALUES (%(tenant_id)s::UUID, %(dataset_id)s::UUID, %(document_id)s::UUID, %(position)s, %(content)s, %(answer)s, %(word_count)s, %(tokens)s, %(index_node_id)s, %(index_node_hash)s, %(hit_count)s, %(disabled_at)s, %(disabled_by)s::UUID, %(status)s, %(created_by)s::UUID, %(updated_by)s::UUID, %(indexing_at)s, %(completed_at)s, %(error)s, %(stopped_at)s) RETURNING document_segments.id, document_segments.enabled, document_segments.created_at, document_segments.updated_at]
[parameters: {'tenant_id': 'e7aa71ec-5c60-4914-99ad-01cb83ae3ac7', 'dataset_id': '959f494e-75e1-4987-9072-82f7e073864f', 'document_id': 'fb6dc544-ed9d-4f6d-a99d-0382e9ba9084', 'position': 1, 'content': [{'content': '问题 1', 'answer': '答案 1'}, {'content': '问题 2', 'answer': '答案 2'}], 'answer': '答案 1', 'word_count': 6, 'tokens': 6, 'index_node_id': 'd04e97ce-09aa-4f83-a581-58bfba94f1e2', 'index_node_hash': 'cddb52bf326a94bfe8608fb815af15ce8bc2bbd5759c69e24a3de0248554cbd1', 'hit_count': 0, 'disabled_at': None, 'disabled_by': None, 'status': 'completed', 'created_by': '1507e141-470b-4719-8c4a-babf6fb730a1', 'updated_by': None, 'indexing_at': datetime.datetime(2025, 1, 14, 4, 51, 57, 968109), 'completed_at': datetime.datetime(2025, 1, 14, 4, 51, 57, 968129), 'error': None, 'stopped_at': None}]

✔️ Expected Behavior

In QA mode, segment information can be imported in bulk via CSV.

❌ Actual Behavior

No response

The text was updated successfully, but these errors were encountered:

dosubot · 2025-01-14T05:11:31Z

The error "can't adapt type 'dict'" suggests that the CSV import process is attempting to insert a dictionary directly into a database field, which is not supported. This typically happens when the data structure being inserted is not properly flattened or serialized.

In the context of the batch_create_segment_to_index_task function, the 'content' field in the CSV file should be a string, not a dictionary. Each row in the CSV should represent a single segment, and if the document form is "qa_model", it should include both "content" and "answer" as separate fields.

To resolve this issue, ensure that the CSV file is structured correctly. Each row should have separate columns for "content" and "answer" (if applicable), and these should be plain text strings, not dictionaries. The CSV processing code should extract these fields and pass them as individual strings to the database insertion logic.

Here's a brief outline of how the CSV should be structured and processed:

CSV Structure: Ensure each row in the CSV has separate columns for "content" and "answer" (if applicable).
Data Processing: When reading the CSV, extract the "content" and "answer" fields as strings, not as dictionaries.
Database Insertion: Ensure that the data being inserted into the database is in the correct format, with 'content' and 'answer' as text fields.

By following these steps, you should be able to avoid the "can't adapt type 'dict'" error during the CSV import process.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

leoterry-ulrica · 2025-01-14T05:25:00Z

demo data file：
template-test.csv

dosubot bot added the 🐞 bug Something isn't working label Jan 14, 2025

crazywoola mentioned this issue Jan 22, 2025

0.15.2 Issue Tracker #12927

Closed

8 tasks

laipz8200 mentioned this issue Jan 22, 2025

fix(batch_create_segment_to_index_task): count max_position in memory. #12929

Merged

5 tasks

laipz8200 closed this as completed in #12929 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

import segments from cvs:can't adapt type 'dict' #12717

import segments from cvs:can't adapt type 'dict' #12717

leoterry-ulrica commented Jan 14, 2025

dosubot bot commented Jan 14, 2025

leoterry-ulrica commented Jan 14, 2025

import segments from cvs:can't adapt type 'dict' #12717

import segments from cvs:can't adapt type 'dict' #12717

Comments

leoterry-ulrica commented Jan 14, 2025

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

dosubot bot commented Jan 14, 2025

leoterry-ulrica commented Jan 14, 2025