-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
XML: modernize API when available & workaround issues with legacy versions #15899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML: modernize API when available & workaround issues with legacy versions #15899
Conversation
Use the new per-context error handlers when available (libxml 2.13 and newer) that now report errors properly (and enforces NOERROR). Tries to avoid thread safety issues in libxml 2.9 to 2.12 by resetting the default error handlers while the current fiber may swapcontext, for example while doing IO, and setting them back afterwards. This will at least avoid leaking error handlers in the thread locals while we switched to another fiber that may mess with these (oops), and makes sure to restore them properly, even if the fiber is resumed to another thread (no lost references). Deprecates the public API for generic and structured error collect handlers. They're unsafe and musn't be used!
Use the new per-context indentation configuration from libxml 2.14 when availmable, otherwise tries to avoid concurrency and parallelism issues by saving the globals before a potential fiber swapcontext and restoring them afterwards.
line_number = LibXML.xmlTextReaderLocatorLineNumber(locator) | ||
raise Error.new(msg_str, line_number) | ||
end | ||
LibXML.xmlTextReaderSetStructuredErrorHandler(@reader, ->Error.structured_callback, Box.box(@errors)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice this is a breaking change... but I don't understand why "read from string" should raise, while "read from io" should collect errors instead 🤨
@@ -487,38 +487,68 @@ class XML::Node | |||
end | |||
end | |||
|
|||
# :nodoc: | |||
SAVE_MUTEX = ::Mutex.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need the mutex anymore and it would still not restore the xmlIndentTreeOutput
global thread local when the fiber is resumed by another thread.
) XML::Reader: use xmlTextReaderSetStructuredErrorHandler libxml: add bindings to new bindings in 2.13 and 2.14 (improved thread safety) libxml 2.13+ never report errors when NOERROR option is set Modernize the error handling & improve thread safety Use the new per-context error handlers when available (libxml 2.13 and newer) that now report errors properly (and enforces NOERROR). Tries to avoid thread safety issues in libxml 2.9 to 2.12 by resetting the default error handlers while the current fiber may swapcontext, for example while doing IO, and setting them back afterwards. This will at least avoid leaking error handlers in the thread locals while we switched to another fiber that may mess with these (oops), and makes sure to restore them properly, even if the fiber is resumed to another thread (no lost references). Deprecates the public API for generic and structured error collect handlers. They're unsafe and musn't be used! Modernize XML generation + improve fiber/thread issues Use the new per-context indentation configuration from libxml 2.14 when availmable, otherwise tries to avoid concurrency and parallelism issues by saving the globals before a potential fiber swapcontext and restoring them afterwards. Fix: review suggestions + format
Uses the libxml per context error handlers when available.
For example we can always use it for
XML::Reader
since it's available since at least libxml 2.9 (released in 2012), that becomes the default expected libxml version.Sadly the other per context error handlers only appeared in libxml 2.13 (released in 2024) so we can't assume they exist, but can still use them when compiling against this version.
This patch adds a libxml version detection using
pkg-config
. We can specify theLIBXML_VERSION
environment variable to target another release if the runtime version will be different (though it is highly recommended to use libxml 2.13+), or for the Windows MSVC target.On older libxml releases, errors can be raised per context (e.g. xml reader) and through the structured error handler (now deprecated) and the older generic (long deprecated).
For these older releases, this patch still sets the globals (actually thread locals), be they error handlers or some xml save configuration, but saves them when the fiber might
yield
, restoring the default handlers; it eventually restores the globals when the fiber is resumed, so the current thread will see the values previously configured for the fiber, whatever if another fiber did something else, or the fiber got resumed by another thread (execution context).NOTE: libxml 2.14 almost always segfaults during GC while runningfixed in #15906.spec/std/xml
, while the 2.9 to 2.13 releases are fine.