Rows with identical values get identical hash codes in the CSV driver #180

kolovos · 2025-05-01T11:46:34Z

This can be problematic in other parts of Epsilon such as ETL's default transformation strategy which assume that model elements have unique hash codes. Options to explore:

Add an ordinal nubmer field to CSV model elements to avoid duplicate hash codes
Change from hash codes to system identities in ETL's FastTransformationStrategy

The text was updated successfully, but these errors were encountered:

arcanefoam · 2025-05-06T13:22:05Z

System identifiers can be problematic with Ecore elements (Epackage, EClass, etc) as sometimes the same metamodel is loaded twice, so the same EClass has two different system identifiers (what Ed Willink nicely called metamodel schizophrenia)

…

On Thu, May 1, 2025, 5:46 AM Dimitris Kolovos ***@***.***> wrote: *kolovos* created an issue (eclipse-epsilon/epsilon#180) <#180> This can be problematic in other parts of Epsilon such as ETL's default transformation strategy which assume that model elements have unique hash codes. Options to explore: - Add an ordinal nubmer field to CSV model elements to avoid duplicate hash codes - Change from hash codes to system identities in ETL's FastTransformationStrategy — Reply to this email directly, view it on GitHub <#180>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQOU3L76EHVFKB7N6V3VPL24ICS7AVCNFSM6AAAAAB4HXHHYOVHI2DSMVQWIX3LMV43ASLTON2WKOZTGAZTGNRZGE2DEMY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

agarciadom · 2025-05-14T18:04:19Z

I'd avoid making a broad change to ETL as it may have unintended consequences. It may be better to change CSV rows so they have different hashcodes for each other.

When I tried adding a row number to a CSV row, I realized it'd be harder than I expected to keep it up to date as rows are removed/inserted in the middle of the file. We wouldn't really want to expose such a pseudo-row number to users, as its behaviour may not be very reliable.

Why not change the internal representation of a row to a LinkedHashMap subclass which reverts hashCode+equals to be based on object identity?

agarciadom added a commit that referenced this issue May 14, 2025

CSV: Use system identity-based hashcodes (fixes #180)

b3c53ba

agarciadom added this to the 2.9.0 milestone May 14, 2025

agarciadom closed this as completed in fb850e7 May 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rows with identical values get identical hash codes in the CSV driver #180

Rows with identical values get identical hash codes in the CSV driver #180

kolovos commented May 1, 2025

arcanefoam commented May 6, 2025 via email

agarciadom commented May 14, 2025 •

edited

Loading

Rows with identical values get identical hash codes in the CSV driver #180

Rows with identical values get identical hash codes in the CSV driver #180

Comments

kolovos commented May 1, 2025

arcanefoam commented May 6, 2025 via email

agarciadom commented May 14, 2025 • edited Loading

agarciadom commented May 14, 2025 •

edited

Loading