Open
Description
Parsing is done with chunking with the following code:
myhtml_tree_t* Parse(myhtml_t* myhtml, const std::string& body,
size_t chunk_sz) {
myhtml_tree_t* tree = myhtml_tree_create();
myhtml_tree_init(tree, myhtml);
size_t body_chunk_pos = 0;
while (body_chunk_pos < body.size()) {
size_t current_chunk_sz = std::min(chunk_sz, body.size() - body_chunk_pos);
mystatus_t parse_status = myhtml_parse_chunk_single(
tree, body.c_str() + body_chunk_pos, current_chunk_sz);
if (parse_status != MyHTML_STATUS_OK) {
myhtml_tree_destroy(tree);
return nullptr;
}
body_chunk_pos += current_chunk_sz;
}
return tree;
}
And called with arguments:
myhtml_t* myhtml = myhtml_create();
myhtml_init(myhtml, MyHTML_OPTIONS_DEFAULT, 1, 0);
std::string body = "<html><head><style>a</style></head><body>f</body></html>";
size_t chunk_sz = 13;
myhtml_tree_t* tree = Parse(myhtml, body, chunk_sz);
Depending on build options, there may be various results.
In some cases serialized tree looks like this:
<html><head><style>a</style></head><body>f</body></html></style></head><body></body></html>
In some cases looks like this
<html><head><style></style></head></html>
While it should be:
<html><head><style>a</style></head><body>f</body></html>
After some investigation I found out, that the issue is inside myhtml_tokenizer_state_rawtext_end_tag_name
with token_node->raw_begin
.
Metadata
Metadata
Assignees
Labels
No labels