Fix for insufficient nested lists markdown indentation #289
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Starting since I last updated markdownify, we have seen problems with our nested lists being interpreted correctly in markdown.
The issue appears to be that instead of using tabs as the indentation for nested lists, the new update has been using 2 or 3 spaces, depending on whether the list was a UL or an OL.
So for this UL as HTML:
this is what we would get as the markdown string:
Looks basically right, except when we re-interpreted this using the Markdown library, it would only recognize 4 spaces as a nested list, not 2 spaces as we have here.
So once we re-converted this to HTML, it would look like this:
That is, the first and second levels were both root level lists, and the 3rd and 4th levels were demoted to level 1.
Similarly, for our ordered lists, we would see something similar.
as markdown:
Once re-interpreted to markdown, we were getting a result like this:
Again, not what we want.
So this update forces all the indents to be 4 spaces, not 2 or 3 as it was previously.
Here is the commit that changed this behaviour (which we are partially undoing here): matthewwithanm/python-markdownify@c13bdd5
Screenshots
This is what it looked like to import a Document with the following HTML before and after this change:
example as unstyled HTML
rendered in the app
Note that the first OL has 3 levels because the indent was 3 spaces, and 3 * 3 is 9. Since nested lists are only recognized at intervals of 4, 9 is > 8.