Skip to content

Integrate Wikidata #710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tfmorris opened this issue Dec 31, 2017 · 12 comments
Open

Integrate Wikidata #710

tfmorris opened this issue Dec 31, 2017 · 12 comments
Labels
Affects: Data Issues that affect book/author metadata or user/account data. [managed] Affects: Partners Lead: @RayBB Issues overseen by Ray (Onboarding & Documentation Lead) [manages] Module: Wikidata Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Priority: 3 Issues that we can consider at our leisure. [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Type: Epic A feature or refactor that is big enough to require subissues. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed]

Comments

@tfmorris
Copy link
Contributor

tfmorris commented Dec 31, 2017

There's a ton of stuff that we could be leveraging Wikidata for in addition to just linking author records to Wikidata.

Etc, etc. Basically use Wikidata to bling it up without having to spend a lot of time/effort.

@nichtich
Copy link

First step would be to link corresponding Wikidata ID to works, editions, authors and subjects. Links can be added in Wikidata with two Wikidata properties:

To avoid synchronization headache it may make more sense to keep Wikidata as master for these links and harvest them regularly (plus live via SPARQL or MediaWiki API). Nevertheless OL should provide an editing interface to these links but directly edit in Wikidata via OAuth.

@LeadSongDog
Copy link

Second step could be to flesh out identifier lists and classification lists in the OL edition records using harvests from wikidata. This opens the door to finding other (non-IA) online-access copies.

@xayhewalo xayhewalo added Affects: Data Issues that affect book/author metadata or user/account data. [managed] Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] State: Backlogged Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed] labels Oct 31, 2019
@xayhewalo
Copy link
Collaborator

I think the steps are our good start but should be more granularly delineated. @hornc Your insight would be valuable in this thread.

@tfmorris
Copy link
Contributor Author

@nichtich I'm surprised Open Library subject got approved as a Wikidata property. I recommend we discourage it's use since OL Subjects are a mess and going to change when we get around to either normalizing them or internationalizing them (or both). It has less than 600 uses now versus ~207,000 for the Open Library ID property.

@guyjeangilles I won't object if you want to break this into 5+ tickets, but that task could also be left until someone's ready to work on it in the spirit of Agile's just in time planning.

One thing I left off the original list was harvesting author birth & death dates, profession, AKAs, etc to help with disambiguation and photos for authors who don't have them.

@xayhewalo xayhewalo added the Type: Epic A feature or refactor that is big enough to require subissues. [managed] label Nov 8, 2019
@xayhewalo
Copy link
Collaborator

@tfmorris I'm not apposed to adopting Agile planning in the future, but considering our habit of leaving issues unattended for years, I'm assigning @hornc for the time being per slack discussion.

@xayhewalo xayhewalo added the Priority: 3 Issues that we can consider at our leisure. [managed] label Nov 26, 2019
@xayhewalo xayhewalo added Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] and removed Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] labels Dec 24, 2019
@hornc hornc removed their assignment Jan 14, 2020
@mekarpeles mekarpeles changed the title Make better use of Wikidata 2021 H2 Integrate Wikidata Jan 23, 2021
@RayBB
Copy link
Collaborator

RayBB commented Apr 11, 2021

Adding wikidata ids to works is blocked by #1797

@mekarpeles mekarpeles changed the title 2021 H2 Integrate Wikidata Integrate Wikidata Sep 7, 2023
@mekarpeles mekarpeles added Lead: @RayBB Issues overseen by Ray (Onboarding & Documentation Lead) [manages] and removed Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] labels Sep 16, 2023
@RayBB
Copy link
Collaborator

RayBB commented Oct 6, 2023

The first steps are awaiting review in #8236
I also have this handy wiki page with some idea and I've added yours there.

@tfmorris
Copy link
Contributor Author

tfmorris commented Nov 6, 2024

The first steps are awaiting review in #8236

So apparently that got closed and replaced by #9130 without linking it to this issue so that people could comment.

I also have this handy wiki page with some idea and I've added yours there.

Why use a wiki page instead of a series of sub-issues linked to this master issue (ie epic) so that they can be commented on? There's also apparently another secret version hiding here - https://docs.google.com/document/d/1-xAija9Pfhtwc-wCAgERHBx6GvFv40Hb-nfHd9f6SPc/edit

@github-actions github-actions bot added the Needs: Response Issues which require feedback from lead label Nov 7, 2024
@mekarpeles
Copy link
Member

@tfmorris just wanted to comment in that I don't think anyone is trying to make things secret or hidden and I don't imagine the phrasing makes folks feel particularly good / valued -- I observe people being generous with their time and working hard to be on the same team, contribute meaningfully to the project, and move things forward. We're a small community, there's a lot to do, and it can be hard to do everything to everyone's satisfaction. As a result, I think constructive criticism is pretty essential.

I value your passion and commitment to the community and appreciate that you invest the time and energy to voice your opinion. And at the same time, as per our code of conduct, I humbly and kindly request we try to make an effort to keep our critiques constructive and our language welcoming. It's more important that we feel comfortable and good about contributing as a team than getting everything right and I think we're more likely to respond well to criticism when it's delivered out of genuine care. Thanks for caring :)

@tfmorris
Copy link
Contributor Author

tfmorris commented Jan 8, 2025

@mekarpeles Just stumbled across this as I was revisiting needs to be done with Wikidata. Thank you for the reminder -- and thank you also for recognizing how generous we all are with our time.

I would humbly and constructively suggest that Github issues remain the focus, as they historically has been, rather than diverting to wikis, Google Docs, or other less accessible channels, and that new issues / PRs link to the relevant issues that they complement / address. You didn't comment on the main thrust of my suggestion in the previous reply, so I'm unsure as to whether this is something that you support or disagree with.

Thank you for recognizing that I continue to care despite having contributed for decades with zero thanks -- and Happy New Year!

@tfmorris
Copy link
Contributor Author

tfmorris commented Jan 8, 2025

Analytics review for Wikidata engagement in #10294

@mekarpeles
Copy link
Member

mekarpeles commented Jan 13, 2025

@tfmorris, without sarcasm, I am grateful that you are so generous with your time. I'll reiterate that it's clear to me how much you care, that you have indeed invested a lot of time trying to help, and I take pride in both of these aspects of you. Furthermore, I feel bad if it hasn't felt like our thanks is reaching you. I want to at least dignify my own gratitude for you by pulling up a few cases where your efforts have been met with appreciation:

Screenshot 2025-01-13 at 10 12 41 AM Screenshot 2025-01-13 at 10 12 59 AM Screenshot 2025-01-13 at 10 13 14 AM Screenshot 2025-01-13 at 10 13 37 AM Screenshot 2025-01-13 at 10 13 53 AM Screenshot 2025-01-13 at 10 14 04 AM Screenshot 2025-01-13 at 10 14 27 AM Screenshot 2025-01-13 at 10 16 33 AM Screenshot 2025-01-13 at 10 16 44 AM

I agree with you, keeping github issues focused is also important to me and my goal.

There is no but.

Sometimes it takes time to figure out the right processes. We've tried wikis, Google Docs, or other channels help us brainstorm, plan, and have conversations in ways that ultimately we've hoped will help us keep github more focused. We try hard to make all of our documents public and link to them in issues when we use them. With 800 issues opened by a whole community of people, it can be difficult to both keep the quality of every issue high and also keep track of every effort so we've had to make some painful compromises and definitely made mistakes.

Sometimes good faith side-conversations are necessary to make sure everyone feels good about how things are going.

It appreciate it may feel like a distraction for us to have this type of conversation via a github issue, though I warmly extend that this type of good faith conversation we're having is also one I welcome on our Tuesday community calls, and that many contributors are having similar conversations on slack each week. I understand slack is a less accessible channel (yes, we could consider discord or matrix), yet these conversations often occur on slack for the same reason you're describing: so we ultimately can do more focused work on github.

I too would love for us to be able to use Github in a focused way. I more strongly feel that we have been given the trust of the open library community to stay true to the values of the code of conduct the community advocated for, and I intend to. I believe it's central to how we continue to be able to focus and work together effectively and civilly on Open Library.

Very respectfully, I am a human being that matters and I'd also love to not have to spend my time coopting github issues to address matters of code of conduct. It doesn't feel good to me when (a) I receive messages that other people may feel disrespected, (b) when I distract people from their work, and (c) that my own time needs to be spent this way.

Project management for Wikidata Integration

Regarding project management, my apologies re: #710 (comment) that #8236 was closed without linking to #9130. I own this failure for not doing a better job at ensuring the community has access to all the procedures staff typically tries to follow. @RayBB has similarly been working incredibly hard as a volunteer lead, I think he's been doing an exceptional job helping unblock the community with PR reviews, and I've found him to be very receptive to suggestions and feedback. Since ultimately PRs are triaged by me and staff, I will try to step up to make sure we do the right checks upfront to support both of you (@tfmorris and @RayBB) moving forward, so I thank you for bringing up these opportunities. Also, thank you @RayBB for trying to pull all our wikidata notes in one place. It seems like there may still be an opportunity to take our wiki + google docs + github issue notes and put them in one place... Github issues has the disadvantage of being sequential / comment based and makes it quite hard to collaboratively project roadmap.

I think really what may have been helpful is a call between stakeholders to converge on which specific Wikidata Integration efforts we want to try.

Back to the technical

  • I agree with Integrate Wikidata  #710 (comment) that our subjects as is need a major rethink and aren't a great candidate for wikidata (in my humble opinion). Some subject perhaps could be defined based on Library of Congress classifications

  • I'm supportive of us continuing to use author and work wikidata IDs where possible.

  • I think IDs may make most sense as a starting point and dynamically pulling in data from Wikidata + caching them (and perhaps allowing librarians to choose from this data, as I believe @sbwhitt @davidscotson @cdrini and @RayBB have tried prototyping in the past).

  • One of the difficult aspects of importing Wikidata keys and values into Open Library is figuring out the schema changes (e.g. series @cdrini) and also how to handle conflicting data (which system wins @hornc).

Thank you

@mekarpeles mekarpeles removed the Needs: Response Issues which require feedback from lead label Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects: Data Issues that affect book/author metadata or user/account data. [managed] Affects: Partners Lead: @RayBB Issues overseen by Ray (Onboarding & Documentation Lead) [manages] Module: Wikidata Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Priority: 3 Issues that we can consider at our leisure. [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Type: Epic A feature or refactor that is big enough to require subissues. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed]
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants