Skip to content

Commit 7f0f956

Browse files
authored
Fix libxml manual memory management (#15906)
libxml allows to customize the memory allocators, which we used to plug the GC. Under certain conditions (e.g. multi-threading, libxml 2.14), **this integration leads to segfaults** when a GC cycle happens within a libxml function. I'm not quite sure what's happening. Maybe libxml keeps pointers somewhere that the GC doesn't scan (thread locals?) and the GC collects them... though that would create random segfaults, not segfaults during GC. Anyway: removing the GC integration fixes the issue. In addition, the libxml2 distributed in macOS 15.4 is patched to remove the custom memory allocators API. Other bindings to external libraries in Crystal don't plug the GC but use manual memory management to free the external allocations when not needed anymore (automated through finalizers). The difficulty is that we allocate the whole DOM tree with libxml. We could use a libxml parser to build a DOM tree with Crystal objects, and it would be fantastic, but then we'd have to reimplement XPath. **This patch replaces the GC integration with explicit memory management instead.** For readers, writers, or XPath contexts, we merely free the libxml allocations in finalizers. Tree nodes are more complex. The XML free functions will recursively free the whole subtree; a node might be unlinked and thus won't be freed with the rest of the document anymore, etc. We can store a reference to the XML::Node of documents into the libxml node (using the `_private` struct member) because the XML::Node will live exactly as long as the libxml doc. We can't do that for subtree nodes that may be collected at any time and would leave dangling pointers. We could not care (and allow multiple XML::Node to represent one libxml node for example, since collecting the doc will free the libxml node) but a XML::Node can be unlinked from the main tree, and must be manually freed. I thus introduced a dual reference: 1. Every XML::Node has a strong reference to its document's XML::Node (can't free document while we have references to a subtree node); 2. Every document XML::Node keeps a cache with weak references to all its subtree XML::Node (only those that have been instantiated). 3. Every document XML::Node keeps a list with strong references to any unlinked XML::Node in order to avoid memory leaks from nodes that have been unlinked and thus wouldn't be freed with the document. The unlinked XML::Node keeps the parent reference to the document (same as libxml's `node->doc`) until it is linked into another document. When that happens, it should also be removed from the unlinked list. A XML::Node is thus unique per libxml node, yet can be collected when unused without freeing some of the document, after which we can instantiate a new XML::Node. **NOTE**: I was able to avoid breaking changes, especially in constructors, but I still undocumented the constructors that take an external libxml pointer, or expose internal details. I assume they should be internal API. We should extend the XML integration, for example with DOM manipulation (following the DOM spec, not the libxml API) so nobody has to extend the libxml integration themselves.
1 parent 23d92eb commit 7f0f956

File tree

8 files changed

+285
-109
lines changed

8 files changed

+285
-109
lines changed

src/xml/attributes.cr

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@ require "./node"
33
class XML::Attributes
44
include Enumerable(Node)
55

6+
# :nodoc:
67
def initialize(@node : Node)
78
end
89

910
def empty? : Bool
1011
return true unless @node.element?
1112

12-
props = self.props
1313
props.null?
1414
end
1515

@@ -39,23 +39,54 @@ class XML::Attributes
3939
end
4040

4141
def []=(name : String, value)
42+
if prop = find_prop(name)
43+
# manually unlink the prop's children if we have live references, so
44+
# xmlSetProp won't free them immediately
45+
@node.document.unlink_cached_children(prop)
46+
end
47+
4248
LibXML.xmlSetProp(@node, name, value.to_s)
4349
value
4450
end
4551

4652
def delete(name : String) : String?
47-
value = self[name]?.try &.content
48-
res = LibXML.xmlUnsetProp(@node, name)
49-
value if res == 0
53+
prop = find_prop(name)
54+
return unless prop
55+
56+
value = ""
57+
if content = LibXML.xmlNodeGetContent(prop)
58+
value = String.new(content)
59+
end
60+
61+
if node = @node.document.cached?(prop)
62+
# can't call xmlUnsetProp: it would free the node
63+
node.unlink
64+
value
65+
else
66+
# manually unlink the prop's children if we have live references, so
67+
# xmlUnsetProp won't free them immediately
68+
@node.document.unlink_cached_children(prop)
69+
value if LibXML.xmlUnsetProp(@node, name) == 0
70+
end
71+
end
72+
73+
private def find_prop(name)
74+
prop = @node.to_unsafe.value.properties.as(LibXML::Node*)
75+
while prop
76+
if String.new(prop.value.name) == name
77+
return prop
78+
end
79+
prop = prop.value.next
80+
end
5081
end
5182

5283
def each(&) : Nil
5384
return unless @node.element?
5485

5586
props = self.props
5687
until props.null?
57-
yield Node.new(props)
58-
props = props.value.next
88+
yield Node.new(props.as(LibXML::Node*), @node.document)
89+
props = props.value.next.as(LibXML::Attr*)
5990
end
6091
end
6192

@@ -73,7 +104,7 @@ class XML::Attributes
73104
pp.list("[", self, "]")
74105
end
75106

76-
protected def props
107+
protected def props : LibXML::Attr*
77108
@node.to_unsafe.value.properties
78109
end
79110
end

src/xml/builder.cr

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,11 @@ class XML::Builder
2828
@writer = LibXML.xmlNewTextWriter(buffer)
2929
end
3030

31+
# :nodoc:
32+
def finalize
33+
LibXML.xmlFreeTextWriter(@writer)
34+
end
35+
3136
# Emits the start of the document.
3237
def start_document(version = nil, encoding = nil) : Nil
3338
call StartDocument, string_to_unsafe(version), string_to_unsafe(encoding), nil

src/xml/libxml2.cr

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,7 @@ lib LibXML
149149
fun xmlTextReaderReadOuterXml(reader : XMLTextReader) : UInt8*
150150
fun xmlTextReaderExpand(reader : XMLTextReader) : Node*
151151
fun xmlTextReaderCurrentNode(reader : XMLTextReader) : Node*
152+
fun xmlTextReaderCurrentDoc(reader : XMLTextReader) : Doc*
152153

153154
fun xmlTextReaderSetErrorHandler(reader : XMLTextReader, f : TextReaderErrorFunc) : Void
154155
fun xmlTextReaderSetStructuredErrorHandler(reader : XMLTextReader, f : StructuredErrorFunc, arg : Void*) : Void
@@ -372,18 +373,14 @@ lib LibXML
372373
{% if compare_versions(LibXML::VERSION, "2.14.0") >= 0 %}
373374
fun xmlSaveSetIndentString(SaveCtxPtr, UInt8*)
374375
{% end %}
376+
377+
fun xmlFreeDoc(Doc*)
378+
fun xmlFreeNode(Node*)
379+
fun xmlFreeTextReader(XMLTextReader)
380+
fun xmlFreeTextWriter(TextWriter)
381+
fun xmlXPathFreeContext(XPathContext*)
382+
fun xmlXPathFreeNodeSet(NodeSet*)
383+
fun xmlXPathFreeObject(XPathObject*)
375384
end
376385

377386
LibXML.xmlInitParser
378-
379-
LibXML.xmlMemSetup(
380-
->GC.free,
381-
->GC.malloc(LibC::SizeT),
382-
->GC.realloc(Void*, LibC::SizeT),
383-
->(str) {
384-
len = LibC.strlen(str) + 1
385-
copy = Pointer(UInt8).malloc(len)
386-
copy.copy_from(str, len)
387-
copy
388-
}
389-
)

src/xml/namespace.cr

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
class XML::Namespace
22
getter document : Node
33

4+
# :nodoc:
45
def initialize(@document : Node, @ns : LibXML::NS*)
56
end
67

0 commit comments

Comments
 (0)