Normalize symbols #332

gustaphe · 2025-04-10T08:48:38Z

#331 highlights an issue in the way we currently deal with variable and function names when it comes to subscripting. Substituting in strings leads to some strange edge cases, and the code is pretty difficult to parse.

I suggest a new model: any symbol (that could represent a variable name or a function name) is only normalized once. It follows something like this schedule:

Unicode substitution (:a₁ --> :a_1)
The string is split into "sub-symbols" (:abc_x_y --> (:abc, :x, :y))
Each sub-symbol is normalized separately
If snakecase, sub-symbols are joined with \_, otherwise all but the first pair are ("abc_{x\_y}")

Sub-symbol normalization:

If matches constant list (e.g. inf, atan), get normalized form from dict
If more than one (alphabetical) character, \mathrm (configurable?)
Else return sub-symbol as is

This leaves an uncertainty in how to sort indexing (a_1[3]), and breaks the current behavior of latexify(:abc) --> "$abc$". It will however be more consistent with mathematical notation.

The indexing uncertainty is the biggest block to me, we might have to consider using a placeholder struct, more or less saving :a_b as a special type of :a[:b] and delaying the stringification, but that will require a bit of an overhaul.

The text was updated successfully, but these errors were encountered:

isaacsas · 2025-04-11T12:47:14Z

I'm not sure if I understand the proposal fully, but if a user writes :a₁ or :(a[1]) with snakecase = true, I'd still want to get the Latex string a_1 and not a\_1. My understanding of snakecase = true is that it is in reference to when a user explicitly has a symbol in which they've put an underscore like :a_1 (which should then become a\_1, i.e. a snakecase variable, which is not the same as an array variable or a variable with a unicode subscript). Isn't it meant to handle the common occurrence that one might have a variable written in snakecase in code, like a_long_variable, where the name doesn't map to using subscripts in math notation (in contrast to either the unicode or array reference cases)?

So I guess I'm suggesting the substitution order should be different and unicode substitution should come after determining snakecase so as not to treat unicode subscripts as snakecase.

Adding \mathrm as an option sounds good, but part of the symbolics PR efforts by myself and others were to essentially provide flexibility to opt-out of such wrapping there (they now wrap anything that is more than one character in \mathtt), so it would be nice if that option is turned off by default.

isaacsas · 2025-04-11T12:48:02Z

(One place where it is common to have multicharacter math variables but not do anything special is in chemical reaction models in biology, which often have multi-character chemical species names.)

gustaphe · 2025-04-11T13:39:58Z

Thank you for your comments!

A variable that is both snakecased and has utf8 subscripts sounds pretty cursed, but you're right.

I think there will be a long_symbol_font kwarg that you can set to "mathtt" or nothing in the Symbolics recipe.

Chemical symbols should of course be surrounded in \ce{} and not printed as a product of single character variables. If mhchem is not available I would still do \mathrm.

isaacsas · 2025-04-11T14:10:59Z

True, but if you are using utf subscripts and snakecasing I'm not sure what you can even reasonably expect (and that seems like a pretty exceptional case, usually the two aren't mixed in code I've read).

isaacsas mentioned this issue Apr 11, 2025

#1524 broke MTK reference tests JuliaSymbolics/Symbolics.jl#1526

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize symbols #332

Normalize symbols #332

gustaphe commented Apr 10, 2025

isaacsas commented Apr 11, 2025

isaacsas commented Apr 11, 2025

gustaphe commented Apr 11, 2025

isaacsas commented Apr 11, 2025

Normalize symbols #332

Normalize symbols #332

Comments

gustaphe commented Apr 10, 2025

isaacsas commented Apr 11, 2025

isaacsas commented Apr 11, 2025

gustaphe commented Apr 11, 2025

isaacsas commented Apr 11, 2025