-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Parse MeSH terms in PubMed MEDLINE records #12532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Edits:
Keywords could have the format |
Implemented a prototype to identify potential challenges or problems related to this issue. I encountered two main problems:
In my opinion, addressing these inconsistencies is essential to developing a sustainable and robust solution. |
"Models, Molecular" seems like a hierarchical keyword? Is this always the case that comma is used for hierarchy? |
Although it might look like a hierarchy, it's not actually one. I experimented a bit with the MeSH database, and treating the comma in Here’s the actual hierarchy:
Looking at the hierarchy for Diabetes Mellitus reveals even more complex scenarios. |
This refs #6856 |
I deduct:
|
No, unfortunately, commas do not reflect the hierarchy even though it often appears this way. Here is an example where the comma looks like part of the hierarchy and where it would be possible to abbreviate the keyword by removing the redundant portion Now consider an example where it is clear that commas do not in fact represent the hierarchy. Notice that |
This is a problem in the current situation as well. MEDLINE terms that have multiple subheadings are too long for the entry editor. Since keywords do not wrap, they get cut off and cannot be seen beyond the right boundary of the field. |
Escaping is the only real solution, because the keyword separator comes from the source and not the user. Different sources use different separators, so the user preference alone cannot be relied upon. Substitution is another possibility, but this too would be unreliable since the substitution would have to use a character that does not occur in the source. |
Uh oh!
There was an error while loading. Please reload this page.
Is your suggestion for improvement related to a problem? Please describe.
MEDLINE records are indexed with headings and subheadings (MeSH terms), having a one-to-many relationship between headings and subheadings. PubMed displays the MeSH terms individually (in pairs), like this.
Notice that the heading "Kidney Diseases" repeats for each associated subheading, with trailing asterisks denoting "Major topics". This is not how the MeSH terms appear in PubMed exports, and therefore, not how JabRef imports them.
This is how the terms come in PubMed text files.
JabRef imports this unchanged as one keyword:
This is how the terms appear in PubMed xml.
Again, JabRef imports this as one keyword, this time separating the subheadings with a comma:
Describe the solution you'd like
I would like JabRef to import MeSH terms as individual keywords using the same format as PubMed where each heading has a maximum of one subheading and the major topic is displayed as an asterisk at the end of the heading or subheading string. Keywords generated from plain text or xml files from PubMed should have the same format in JabRef.
The keywords should look like this:
Kidney Diseases*/diagnosis
Kidney Diseases*/epidemiology
Kidney Diseases*/physiopathology
Kidney Diseases*/therapy
The bibtex source should look like this (assuming the user-define keyword separator is a semicolon):
Parsing MeSH terms this way lets the keywords fit better in the GUI and makes it easier to search and filter by keyword.
Additional context
Ideally, the MEDLINE importer (and other importers) would check if the user-defined keyword separator is included in the input, and warn or choose a substitution in case of conflict. List items are appear one per line in PubMed text files, so the keyword separator should not be found in any lines that begin with
MH -
.jabref/src/main/java/org/jabref/model/entry/KeywordList.java
Line 66 in 3a8cba4
Regex for moving asterisks to the end.
Replace with
Discussion on JabRef Discourse
The text was updated successfully, but these errors were encountered: