You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am writing to bring to your attention a concern regarding the dataset reference mentioned in your repository documentation.
In the section titled “Downloading Pre-Tokenized WikiText-103”, the documentation states:
Downloading Pre-Tokenized WikiText-103:
You can obtain the pre-tokenized WikiText-103 dataset binidx file from this Hugging Face dataset link.
However, upon decoding the dataset from the provided link, I found that the dataset content does not match the WikiText-103 dataset. This inconsistency is quite misleading and could cause confusion for users who rely on the documentation to work with the correct dataset.
I believe that accurately identifying the dataset and its source is critical, especially in the context of publication and reproducibility. Misleading references may also affect the credibility of the repository.
The text was updated successfully, but these errors were encountered:
Thank you for bringing this important issue to our attention. We appreciate your careful review and commitment to maintaining accurate documentation.
We have investigated the issue you reported and have already taken corrective action. The correct version of the pre-tokenized WikiText-103 dataset has been uploaded to Hugging Face, and we have updated our README documentation accordingly.
Dear Author,
I am writing to bring to your attention a concern regarding the dataset reference mentioned in your repository documentation.
In the section titled “Downloading Pre-Tokenized WikiText-103”, the documentation states:
However, upon decoding the dataset from the provided link, I found that the dataset content does not match the WikiText-103 dataset. This inconsistency is quite misleading and could cause confusion for users who rely on the documentation to work with the correct dataset.
I believe that accurately identifying the dataset and its source is critical, especially in the context of publication and reproducibility. Misleading references may also affect the credibility of the repository.
The text was updated successfully, but these errors were encountered: