In terminal navigate to this directory and enter commands:
$ make
$ ./xml_parser
You can change the value of the mock input variable *xml_data
in main.c to test different inputs.
This project addresses the challenge of parsing large XML documents where performance is critical. The goal is to build a forward-only XML parser that processes XML efficiently, providing access to data as it traverses the document in a single pass.
- Forward-Only Parsing: Traverse the XML document once without revisiting nodes.
- Event-Based Reporting: Report the current path, element attributes, and values in real-time.
- Efficient Handling:
- Identify and decode XML escaped entities (e.g.,
"
,&
, etc.). - Ignore comments.
- Identify and decode XML escaped entities (e.g.,
- Support Path and Value Tracking:
- Include the full XML path (e.g.,
/root/order/amount
). - Report attributes and text values at the current path.
- Include the full XML path (e.g.,
- Example Use Case: Output the
order id
for any order with anamount > 100
.
The parser is implemented as a state machine that traverses the XML document. Key states include:
- Start Tag: Handles opening tags and parses attributes.
- End Tag: Handles closing tags.
- Text Content: Extracts and decodes text between tags.
- Comments: Ignores XML comments (
<!-- ... -->
).
- Path Maintenance: Tracks the current XML path and updates it dynamically as elements are encountered.
- Attributes: Parses and stores attributes for the current element, making them available during parsing.
Handles standard XML entities:
"
->"
&
->&
'
->'
<
-><
>
->>
The parser triggers callback functions to report events such as:
start_element
: When an element starts.end_element
: When an element ends.text
: When text content is encountered.attribute
: When attributes are parsed.
<root>
<order id="1111">
<amount>150</amount>
</order>
<order id="222">
<amount>2</amount>
</order>
<order id="333">
<amount>2000</amount>
</order>
</root>
void example_callback(const char *event, const char *path, const char *key, const char *value) {
static char current_id[128];
if (strcmp(event, "attribute") == 0 && strcmp(key, "id") == 0) {
strcpy(current_id, value); // Store the order id
} else if (strcmp(event, "text") == 0 && strstr(path, "/order/amount") != NULL) {
int amount = atoi(value);
if (amount > 100) {
printf("Order ID: %s\tAmount: %d\n", current_id, amount);
}
}
}
int main() {
const char *xml_data =
"<root>"
" <order id=\"1111\">"
" <amount>150</amount>"
" </order>"
" <order id=\"222\">"
" <amount>2</amount>"
" </order>"
" <order id=\"333\">"
" <amount>2000</amount>"
" </order>"
"</root>";
XMLParser parser;
init_parser(&parser); // Initialize the parser
parse_xml(&parser, xml_data, example_callback); // Parse the XML with the callback
return 0;
}
For the example XML provided, the program should output:
Order ID: 111 Amount: 150
Order ID: 222 Amount: 2
Order ID: 333 Amount: 2000
Order ID: 123 Amount: 15
Order ID: 321 Amount: 4
Order ID: 231 Amount: 200
Order ID: 213 Amount: 200
Summary
This project provides a forward-only XML parser that efficiently processes XML documents in a single pass. The parser is lightweight, fast, and ideal for scenarios where only immediate access to current data is required.
By implementing this parser, you can handle large-scale XML parsing tasks with minimal overhead while ensuring real-time reporting of attributes, paths, and text content.