-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Add StopParser(), ResumeParser, and GetParsingStatus to expat #59979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The C expat library provides XML_StopParser() method that allows the parsing to be stopped from the handler functions. It would be nice to have this option in Python as well, maybe by adding StopParser() method to the XMLParser class. |
If a handler function raises an exception, the Parse() method exits and the exception is propagated; internally, this also calls XML_StopParser(). |
OK, then this issue has a "bug" part, too: it is not mentioned in the documentation that exceptions from the handler methods propagate through the Parse() method. I guess the parser can be then stopped in this way too, but it is a dirty method as opposed to calling StopParser(). To answer your question, there are several situations where StopParser() could come in handy. For instance, the XML file might contain records (such as the output of a search engine), from which we only need the first n. Another example would be that reading through the file we realize halfway that e.g. it does not contain the information we need, contains wrong information, etc. so we want to skip the rest of it. Since the file might be huge and since XML parsing can in now way be considered fast, being able to stop the parsing in a clear way would spare the superfluous and possible lengthy computation. |
nemeskeyd: would you like to work on a patch (for Python 3.4)? |
loewis: I don't think it would be difficult to fix, so theoretically I'd be in. However, I don't really have the time to work on this right now. |
Below is a sample script that shows that it's possible to stop parsing XML in the middle, without an explicit call to XML_StopParser(): raise StopParsing from any handler, and catch it around the Parse() call. This method covers the two proposed use cases. Do we need another way to do it? import xml.parsers.expat
class StopParsing(Exception):
pass
def findFirstElementByName(data, what):
def end_element(name):
if name == what:
raise StopParsing(name)
p = xml.parsers.expat.ParserCreate()
p.EndElementHandler = end_element try: data = """<?xml version="1.0"?>
<parent id="top"><child1 name="paul">Text goes here</child1>
<child2 name="fred">More text</child2>
</parent>"""
findFirstElementByName(data, "child2") # Found
findFirstElementByName(data, "child3") # Not found |
Amaury: see my previous comment. There are two problems with the method you proposed:
|
Your first point is true, even if the Python zen (try "import this") For your second point: exceptions are a common thing in Python code. This is similar to the EAFP principle http://docs.python.org/glossary.html#term-eafp |
Dávid: Another (similar) example is the Python for loop. In it's original form, it would increase an index and invoke __getitem__ until that *raised* IndexError. In the current definition, it converts the iterated-over object into an iterator, and keeps calling .next until that *raises* StopIteration. So raising an exception to indicate that something is finished is an established Python idiom. In any case, I still think adding StopParser is a useful addition, in particular since that would also allow giving True as the "resumable" argument. Any such change needs to be accompanied by also exposing XML_ResumeParser, and possibly XML_GetParsingStatus. Since we all agree that this is not an important change, I don't mind keeping this issue around until someone comes along to contribute code for it. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: