Skip to content

Add Wikidata table export to data dumps (#10383) #10531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mohitpaddhariya
Copy link
Contributor

Add wikidata dump to included dumps (#10383)

This adds support for generating Wikidata dumps alongside existing dumps like ratings and reading logs. The new dumps follow the ol_dump_wikidata_YYYY-MM-DD.txt.gz naming convention.

  • Creates new dump-wikidata.sql script for TSV export
  • Updates oldump.sh to include Wikidata dump generation
  • Follows existing patterns for file naming and processing

Closes #10383

Technical

  • Added a new SQL script (dump-wikidata.sql) that exports the wikidata table as TSV
  • Modified oldump.sh to add a new step for generating the wikidata dump
  • Used the same conditional logic as other dumps to prevent regenerating existing dumps

Testing

  • Run the oldump.sh script locally with a test database
  • Verify the wikidata dump is created with the correct naming convention
  • Confirm the dump contains the expected TSV data from the wikidata table

Screenshot

N/A - Backend feature

Stakeholders

@cdrini

@mohitpaddhariya
Copy link
Contributor Author

@cdrini or @mekarpeles could you please review the pr

@github-actions github-actions bot added the Needs: Response Issues which require feedback from lead label Mar 3, 2025
@mekarpeles mekarpeles assigned RayBB and mekarpeles and unassigned cdrini Mar 3, 2025
@github-project-automation github-project-automation bot moved this to Waiting Review/Merge from Staff in Ray's Project Mar 3, 2025
@mekarpeles mekarpeles added Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] and removed Needs: Response Issues which require feedback from lead labels Mar 9, 2025
@github-actions github-actions bot removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Mar 17, 2025
@mohitpaddhariya
Copy link
Contributor Author

@mekarpeles could you please review the changes

@github-actions github-actions bot added the Needs: Response Issues which require feedback from lead label Mar 18, 2025
@mohitpaddhariya
Copy link
Contributor Author

@mekarpeles could you please review the pr

mohitpaddhariya and others added 2 commits March 27, 2025 18:11
This adds support for generating Wikidata dumps alongside existing
dumps like ratings and reading logs. The new dumps follow the
ol_dump_wikidata_YYYY-MM-DD.txt.gz naming convention.

- Creates new dump-wikidata.sql script for TSV export
- Updates oldump.sh to include Wikidata dump generation
- Follows existing patterns for file naming and processing
@cdrini cdrini force-pushed the 10383/feature/wikidata-dumps branch from a8c74db to 6b00959 Compare March 27, 2025 22:20
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look at this one, since it might block some pieces for Stef/Ray in the next month, so thought it would be good to get out.

I tested running this locally, and the file generated successfully 🥳

@cdrini cdrini assigned cdrini and unassigned mekarpeles Mar 27, 2025
@cdrini cdrini merged commit 424dadf into internetarchive:master Mar 27, 2025
3 checks passed
@github-project-automation github-project-automation bot moved this from Waiting Review/Merge from Staff to Done in Ray's Project Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs: Response Issues which require feedback from lead
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Add wikidata dump to included dumps
4 participants