Skip to content

Model Card: Allow for dicts in datasets and base_model and also update spec #2479

Open
@mofosyne

Description

@mofosyne

Is your feature request related to a problem? Please describe.

Was working on ggml-org/llama.cpp#8875 to integrate some changes to how we interpret parent models and datasets into GGUF metadata and was alerted that your code currently interprets the datasets as only List[str] while the changes we are proposing would support these types in datasets and base_model :

  • List[str] of hugging face id
  • List[str] of urls to other repos
  • List[dict] of dict with fields like name, author, version, organization, url, doi, uuid and repo_url

Describe the solution you'd like

Update description to indicate support for urls and dict metadata in both datasets and base_model entry in model card as well as update typechecks to support dict as an option.

Describe alternatives you've considered

We already can support these extra metadata in GGUF file format via metadata override files, but it would be nice to be able to sync these feature so we can more easily grab these information from model creator's model card.

Additional context

The code area I'm looking at is

datasets (`List[str]`, *optional*):
List of datasets that were used to train this model. Should be a dataset ID
found on https://hf.co/datasets. Defaults to None.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions