Skip to content

Smart Data Enrichment Tool #48

Open
@davidgasquez

Description

@davidgasquez

Some ideas for a potential "smart data augmentation" tool that could be built on top of Open Datasets.

The idea is to pass your data through a set of "checks" or "matches". You get back a bunch of extra columns that might be relevant. These are derived from all the open datasets.

The matching is done by an LLM. It receives a every column name and a sample of values, and tries to match it with known relevant columns¹.

Additionally, suggest some LLMs-derived columns from existing ones (e.g: Country column passed through a "Capital" prompt) or let the user set a custom prompt to "augment" one of the columns. This won't use any real data but could be useful (e.g: to classify a text sentiment).

¹ To make it fast, each column could run in parallel. The samples could be embedded and used to retrieve similar columns in the Open Datasets space. Same could be done at the column level.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions