Skip to content

Converting a paragraph containing an equation from org to latex results in 3 paragraphs #10836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
METhOphetamine opened this issue May 11, 2025 · 8 comments
Labels

Comments

@METhOphetamine
Copy link

METhOphetamine commented May 11, 2025

Problem description

Converting a paragraph containing an equation from org to latex results in 3 paragraphs, because pandoc inserts a newline before and after the equation. For example this org file:

Some equation here
\begin{equation}
x = y
\end{equation}
where $x$ is something important

Gets converted to the following latex code:

Some equation here

\begin{equation}
x = y
\end{equation}

where \(x\) is something important

But if we use emacs' org-mode for the conversion, we get this latex code:

Some equation here
\begin{equation}
x = y
\end{equation}
where \(x\) is something important

So it should be considered a bug.

You also get the same issue when using the \begin{subequations} and \begin{math} blocks.

Pandoc version?

3.6.4

I also tested it in the online version.

Links to similar issues

Markdown had similar issues in the past.

#7883

#3726

#894

#2171

@jgm
Copy link
Owner

jgm commented May 11, 2025

Notice this difference:

% pandoc -f latex -t native
Some equation here
\begin{equation}
x = y
\end{equation}
where $x$ is something important
^D
[ Para
    [ Str "Some"
    , Space
    , Str "equation"
    , Space
    , Str "here"
    , SoftBreak
    , Math DisplayMath "x = y"
    , SoftBreak
    , Str "where"
    , Space
    , Math InlineMath "x"
    , Space
    , Str "is"
    , Space
    , Str "something"
    , Space
    , Str "important"
    ]
]

% pandoc -f org -t native
Some equation here
\begin{equation}
x = y
\end{equation}
where $x$ is something important
^D
[ Para
    [ Str "Some" , Space , Str "equation" , Space , Str "here" ]
, RawBlock
    (Format "latex")
    "\\begin{equation}\nx = y\n\\end{equation}\n"
, Para
    [ Str "where"
    , Space
    , Math InlineMath "x"
    , Space
    , Str "is"
    , Space
    , Str "something"
    , Space
    , Str "important"
    ]
]

Potential solution: have the org reader handle the equation environment like the latex reader does, treating it as regular display math. This would facilitate conversion to many formats besides LaTeX. However, it would mean losing the equation number on export to LaTeX.

@jgm
Copy link
Owner

jgm commented May 11, 2025

A different solution would be to parse the equation environment as RawInline rather than RawBlock.

@METhOphetamine
Copy link
Author

METhOphetamine commented May 11, 2025

For me personally losing the equation numbers would be a defeat the reason to use equations at all. Might as well use inline math at that point, if I cannot properly reference it. I'm guessing it would be fine for \begin{math} environment, because that isn't numbered. But I don't use that environment, so I don't know.

I'm guessing you don't want your codebase polluted with sed hacks but my workaround is to use the following sed command before converting the org file with pandoc:

sed -e 's \\end{subequations} \\end{subequations}\n\\noindent{}\\ignorespaces g' file.org | pandoc -o file.tex

(same with normal equations) This does not get rid of the paragraphs, but the reader won't notice because the paragraphs just look like newlines or at least aren't indented. The \ignorespaces macro is required because pandoc adds a single space after the \noindent{} macro which then looks like a paragraph indented with a single space.

@METhOphetamine
Copy link
Author

Yikes sorry, I accidentally closed it. I'm a github noob.

A different solution would be to parse the equation environment as RawInline rather than RawBlock.

Regarding this, I don't know what that would mean. I'm not similar with the codebase, sorry.

@jgm
Copy link
Owner

jgm commented May 11, 2025

If you want something without sed, you could put something in header-includes that uses etoolbox tools to add the \noindent\ignorespaces to the end of these environments.

Making it a RawInline would case the environment to appear inside the paragraph as part of its contents, which is what you want. So that might be a worth while change, and it's probably not a difficult one.

@jgm jgm closed this as completed in 18a132b May 11, 2025
@jgm
Copy link
Owner

jgm commented May 11, 2025

OK, I think I've fixed this in a satisfactory way. @tarleb let me know if I've made some blunder.
This does change how raw TeX is handled in an inline environment, but in the direction of closer conformity to org-mode, as far as I can see.

jgm added a commit that referenced this issue May 11, 2025
Previously inline TeX was handled in a way that was different
from org's own export, and that could lead to information loss.
This was particularly noticeable for inline math environments
such as `equation`.  Previously, an `equation` environment
starting at the beginning of a line would create a raw block,
splitting up the paragraph containing it (see #10836).
On the other hand, an `equation` environment not at the beginning
of a line would be turned into regular inline elements
representing the math. (This would cause the equation number to
go missing and in some cases degrade the math formatting.)

Now, we parse all of these as raw "latex" inlines, which will be
omitted when converting to formats other than LaTeX (and other
formats like pandoc's Markdown that allow raw LaTex).

Closes #10836.
jgm added a commit that referenced this issue May 12, 2025
Previously inline TeX was handled in a way that was different
from org's own export, and that could lead to information loss.
This was particularly noticeable for inline math environments
such as `equation`.  Previously, an `equation` environment
starting at the beginning of a line would create a raw block,
splitting up the paragraph containing it (see #10836).
On the other hand, an `equation` environment not at the beginning
of a line would be turned into regular inline elements
representing the math. (This would cause the equation number to
go missing and in some cases degrade the math formatting.)

Now, we parse all of these as raw "latex" inlines, which will be
omitted when converting to formats other than LaTeX (and other
formats like pandoc's Markdown that allow raw LaTex).

Closes #10836.
@METhOphetamine
Copy link
Author

If you want something without sed, you could put something in header-includes that uses etoolbox tools to add the \noindent\ignorespaces to the end of these environments.

Thanks for the tip!

And obviously thanks for fixing it!

@tarleb
Copy link
Collaborator

tarleb commented May 12, 2025

Thanks for the fix, LGTM.

I'm slowly catching up on the most pressing tasks, might take me while.

christopherkenny pushed a commit to christopherkenny/pandoc that referenced this issue May 23, 2025
Previously inline TeX was handled in a way that was different
from org's own export, and that could lead to information loss.
This was particularly noticeable for inline math environments
such as `equation`.  Previously, an `equation` environment
starting at the beginning of a line would create a raw block,
splitting up the paragraph containing it (see jgm#10836).
On the other hand, an `equation` environment not at the beginning
of a line would be turned into regular inline elements
representing the math. (This would cause the equation number to
go missing and in some cases degrade the math formatting.)

Now, we parse all of these as raw "latex" inlines, which will be
omitted when converting to formats other than LaTeX (and other
formats like pandoc's Markdown that allow raw LaTex).

Closes jgm#10836.
christopherkenny pushed a commit to christopherkenny/pandoc that referenced this issue May 23, 2025
Previously inline TeX was handled in a way that was different
from org's own export, and that could lead to information loss.
This was particularly noticeable for inline math environments
such as `equation`.  Previously, an `equation` environment
starting at the beginning of a line would create a raw block,
splitting up the paragraph containing it (see jgm#10836).
On the other hand, an `equation` environment not at the beginning
of a line would be turned into regular inline elements
representing the math. (This would cause the equation number to
go missing and in some cases degrade the math formatting.)

Now, we parse all of these as raw "latex" inlines, which will be
omitted when converting to formats other than LaTeX (and other
formats like pandoc's Markdown that allow raw LaTex).

Closes jgm#10836.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants