Skip to content

Concern Regarding Misleading Dataset Reference in Documentation #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shizukanaskytree opened this issue Jan 9, 2025 · 1 comment

Comments

@shizukanaskytree
Copy link

Dear Author,

I am writing to bring to your attention a concern regarding the dataset reference mentioned in your repository documentation.
In the section titled “Downloading Pre-Tokenized WikiText-103”, the documentation states:

Downloading Pre-Tokenized WikiText-103:
You can obtain the pre-tokenized WikiText-103 dataset binidx file from this Hugging Face dataset link.

However, upon decoding the dataset from the provided link, I found that the dataset content does not match the WikiText-103 dataset. This inconsistency is quite misleading and could cause confusion for users who rely on the documentation to work with the correct dataset.

I believe that accurately identifying the dataset and its source is critical, especially in the context of publication and reproducibility. Misleading references may also affect the credibility of the repository.

image

@ridgerchu
Copy link
Owner

Thank you for bringing this important issue to our attention. We appreciate your careful review and commitment to maintaining accurate documentation.

We have investigated the issue you reported and have already taken corrective action. The correct version of the pre-tokenized WikiText-103 dataset has been uploaded to Hugging Face, and we have updated our README documentation accordingly.

You can now find the correct dataset at [link].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants