Skip to content

Enable proper pasting of HTML code into fields (by converting to Markdown) #10558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
koppor opened this issue Oct 23, 2023 · 19 comments · Fixed by #10896
Closed

Enable proper pasting of HTML code into fields (by converting to Markdown) #10558

koppor opened this issue Oct 23, 2023 · 19 comments · Fixed by #10896
Labels
good first issue An issue intended for project-newcomers. Varies in difficulty.

Comments

@koppor
Copy link
Member

koppor commented Oct 23, 2023

  1. Go to https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/
  2. Select the text below "Book description"
    a
  3. Copy
  4. Change to JabRef
  5. New entry
  6. Go to "Abstract"
  7. Paste

Actual result:

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?

In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.

    Peer under the hood of the systems you already use, and learn how to use and operate them more effectively
    Make informed decisions by identifying the strengths and weaknesses of different tools
    Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
    Understand the distributed systems research upon which modern databases are built
    Peek behind the scenes of major online services, and learn from their architectures

This is "OK", but the bullet list should be formatted with *:

Expected result:

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?

In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.

* Peer under the hood of the systems you already use, and learn how to use and operate them more effectively
* Make informed decisions by identifying the strengths and weaknesses of different tools
* Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
* Understand the distributed systems research upon which modern databases are built
* Peek behind the scenes of major online services, and learn from their architectures

Some background information:

Note that a paste into Microsoft Word keeps the bullet list:

word

That means that the clipboard contains the "correctly" formatted content.

Implementation hint:

@koppor koppor added the good first issue An issue intended for project-newcomers. Varies in difficulty. label Oct 23, 2023
@github-project-automation github-project-automation bot moved this to Free to take in Good First Issues Oct 23, 2023
@DavidCoy77
Copy link

Hello, I'm an undergraduate programming major that's eager to make my first GitHub contribution. Would I be able to claim this issue? Thanks!

@ThiloteE
Copy link
Member

Some more illustrations: https://www.markdownguide.org/basic-syntax/#unordered-lists

@ThiloteE ThiloteE moved this from Free to take to Reserved in Candidates for University Projects Oct 25, 2023
@ThiloteE ThiloteE moved this from Free to take to Reserved in Good First Issues Oct 25, 2023
@ThiloteE ThiloteE added the FirstTimeCodeContribution Triggers GitHub Greeter Workflow label Oct 25, 2023
@github-actions
Copy link
Contributor

As a general advice for newcomers: check out Contributing for a start. Also, guidelines for setting up a local workspace is worth having a look at.

Feel free to ask here at GitHub, if you have any issue related questions. If you have questions about how to setup your workspace use JabRef's Gitter chat. Try to open a (draft) pull-request early on, so that people can see you are working on the issue and so that they can see the direction the pull request is heading towards. This way, you will likely receive valuable feedback.

@DavidCoy77
Copy link

DavidCoy77 commented Oct 25, 2023

Thank you @ThiloteE and @koppor, I followed all the steps from guidelines for setting up a local workspace and successfully got everything set up and running in IntelliJ. I'll start working on the issue soon.

@DavidCoy77
Copy link

I followed your instructions and I can clearly see the issue. My plan to fix this would be to add a method to JabRef that parses html and convert it to markdown. Per your included stack overflow hint, the FlexmarkHtmlConverter class lets us do this easily in only one line of code. In order to parse the text being pasted into the "Abstract" GUI text box, I'd need to find where that text is stored in the source code (i.e. what variable stores the text). From there I imagine performing a manipulation on that variable with the FlexmarkHtmlConverter class would be fairly simple. Would this adequately resolve the issue?

Please let me know if my plan is on the right track, and if so, even a little hint as to where I can find the code for the text boxes would be very much appreciated. Thanks!

screenshot of text pasted into the abstract field of a new entry in JabRef

@koppor
Copy link
Member Author

koppor commented Oct 25, 2023

@DavidCoy77 Development involves code reading and understanding. That takes time. You can search for an arbitrary string in the code using Ctrl+Shift+F. Maybe that helps to locate some code.

I searched around the code a bit, found org.jabref.gui.fieldeditors.FieldEditors which brought me to org.jabref.gui.fieldeditors.SimpleEditor. However, no paste handling there. Using Ctrl+N, I searched for Clipboard and found ClipboardManager. No help. With Ctrl+Shift+F, I searched paste and found org.jabref.gui.edit.EditAction#execute. I think, one needs to hook into there.

Just try. We do not know the solution yet. You are the one searching for the solution.

@DavidCoy77
Copy link

I'll keep plugging away!

@DavidCoy77
Copy link

I've spent time brainstorming and taking notes, but I haven't changed any code yet, so I ran the gradlew check command to see if everything was working properly. I've included a .zip with screenshots of the failed tests. I ran gradlew check several times, and once each with the "Run tests using" parameter set to both IntelliJ IDEA and Gradle. All of my runs had the same failed tests.

Are these failed tests anything I need to worry about for the scope of this issue? Is there a mistake I made in the setup process that caused those tests to fail?

Thanks!

failed tests IntelliJ IDEA.zip

@koppor
Copy link
Member Author

koppor commented Nov 10, 2023

Regarding the failed tests, we started to work on that at #9992, but we failed.

Note that you can execute all the tests in WSL2 - and they will be green. I don't have a good HowTo in mind, but https://youtrack.jetbrains.com/issue/IDEA-284123/Can-I-execute-JUnit-Test-using-IntelliJ-in-WSL2 should be of some guidance.

You are invited to switch topics and try to fix (part of) the failing tests.


There nearly aren't any UI tests. You can start understanding the existing entry editor ui tests and then think of how to add the feature tests there.


Please don't let you get distracted by some locally failing tests. Our CI runs through.

UI coding is hard. I would recommend to try to work with breakpoints to understand how JabRef / JavaFX works and then hook into the appropriate places.

@DavidCoy77
Copy link

Ok sounds good, I just wanted to double check so thanks!

@shubham-dutta744
Copy link

Hi @koppor there is nothing to change in the code. You can copy the book description and paste it into Microsoft word then only you should copy and paste the same description in the Abstract section in jabref tool.
Screenshot 2024-01-18 200316

@shubham-dutta744
Copy link

Hi @koppor It appears that when extracting data from a browser and inputting it into JabRef's abstract field, bullet points are not preserved. To address this, you suggest an interim step of transferring the data to MS Word before copying it into JabRef. However, if formatting issues, such as the absence of bullet points, persist, consider adjusting the formatting in MS Word or exploring alternative formats, like Markdown, which JabRef supports more effectively. If you have specific issues or questions, feel free to provide more details for further assistance.

@koppor
Copy link
Member Author

koppor commented Jan 22, 2024

Hi @koppor It appears that when extracting data from a browser and inputting it into JabRef's abstract field, bullet points are not preserved.

Yes. See the steps at the issue description (first entry in this thread)

To address this, you suggest an interim step of transferring the data to MS Word before copying it into JabRef.

No, I do not. JabRef should "just" work.

Please make JabRef handling HTML input properly. Maybe, there is a Java library able to convert HTML to Markdown. - Please start to use flexmark-html2md-converter.

@koppor koppor removed the FirstTimeCodeContribution Triggers GitHub Greeter Workflow label Jan 22, 2024
@koppor koppor moved this from Reserved to Free to take in Good First Issues Jan 22, 2024
@koppor koppor moved this from Reserved to Free to take in Candidates for University Projects Jan 22, 2024
@koppor koppor moved this from Free to take to Reserved in Candidates for University Projects Jan 28, 2024
@koppor koppor moved this from Free to take to Reserved in Good First Issues Jan 28, 2024
@brennanmcmicking
Copy link

brennanmcmicking commented Feb 13, 2024

Hi @koppor, it looks like the flexmark-html2md library doesn't do what you originally thought. It's for converting snippets like this to HTML:

<span>
  <div>
    <p>
      Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?
    </p>
    <p>
      In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.
    </p>
    <ul>
      <li>
        Peer under the hood of the systems you already use, and learn how to use and operate them more effectively
      </li>
      <li>
        Make informed decisions by identifying the strengths and weaknesses of different tools
      </li>
      <li>
        Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
      </li>
      <li>
        Understand the distributed systems research upon which modern databases are built
      </li>
      <li>
        Peek behind the scenes of major online services, and learn from their architectures
      </li>
    </ul>
  </div>
</span>

not this:

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?

In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.

    Peer under the hood of the systems you already use, and learn how to use and operate them more effectively
    Make informed decisions by identifying the strengths and weaknesses of different tools
    Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
    Understand the distributed systems research upon which modern databases are built
    Peek behind the scenes of major online services, and learn from their architectures

Do you have any suggestions as to where to go from here? We haven't been able to find an existing library for interpreting the second snippet as markdown. I suspect that programs like MS Word use their own custom parsers.

@koppor
Copy link
Member Author

koppor commented Feb 13, 2024

@brennanmcmicking Please state the source of your claim. Maybe a minimal Java project.

I checked the source code of FlexmarkHtmlConverter. The method convert, which can be inspected at https://github.com/vsch/flexmark-java/blob/cc3a2f59ba6e532833f4805f8134b4dc966ff837/flexmark-html2md-converter/src/main/java/com/vladsch/flexmark/html2md/converter/FlexmarkHtmlConverter.java#L356 reads like that HTML is given as input and Markdown is given as output.

@brennanmcmicking
Copy link

@koppor Yes, I agree. FlexmarkHtmlConverter takes HTML as input, not what gets put on to your clipboard when you copy from a webpage. That is the problem: we're not trying to parse HTML as markdown, we want to parse snippets like such as markdown:

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?

In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.

    Peer under the hood of the systems you already use, and learn how to use and operate them more effectively
    Make informed decisions by identifying the strengths and weaknesses of different tools
    Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
    Understand the distributed systems research upon which modern databases are built
    Peek behind the scenes of major online services, and learn from their architectures

My claim is that Flexmark's HTML2MD converter is of no use to us in this context.

@brennanmcmicking
Copy link

Wait nevermind, hooking into the javafx.scene.input.Clipboard directly and using its .getHtml() method was the missing key. Figured it out!

@Ashwin2397
Copy link
Contributor

Hi @koppor, you may assign this issue to me :). I have a PR for this fix and am currently waiting for it's checks to pass before making it ready for review.

@Ashwin2397
Copy link
Contributor

Hi @koppor , my PR is ready for review, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue An issue intended for project-newcomers. Varies in difficulty.
Projects
Archived in project
6 participants