Skip to content

Add StopParser(), ResumeParser, and GetParsingStatus to expat #59979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nemeskeyd mannequin opened this issue Aug 24, 2012 · 9 comments
Open

Add StopParser(), ResumeParser, and GetParsingStatus to expat #59979

nemeskeyd mannequin opened this issue Aug 24, 2012 · 9 comments
Labels
docs Documentation in the Doc dir topic-XML type-feature A feature request or enhancement

Comments

@nemeskeyd
Copy link
Mannequin

nemeskeyd mannequin commented Aug 24, 2012

BPO 15775
Nosy @loewis, @amauryfa

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2012-08-24.08:18:17.217>
labels = ['expert-XML', 'type-feature', 'docs']
title = 'Add StopParser(), ResumeParser, and GetParsingStatus to expat'
updated_at = <Date 2012-09-06.11:26:29.478>
user = 'https://bugs.python.org/nemeskeyd'

bugs.python.org fields:

activity = <Date 2012-09-06.11:26:29.478>
actor = 'loewis'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation', 'XML']
creation = <Date 2012-08-24.08:18:17.217>
creator = 'nemeskeyd'
dependencies = []
files = []
hgrepos = []
issue_num = 15775
keywords = []
message_count = 9.0
messages = ['168980', '169207', '169255', '169281', '169285', '169879', '169905', '169906', '169913']
nosy_count = 4.0
nosy_names = ['loewis', 'amaury.forgeotdarc', 'docs@python', 'nemeskeyd']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue15775'
versions = ['Python 3.4']

@nemeskeyd
Copy link
Mannequin Author

nemeskeyd mannequin commented Aug 24, 2012

The C expat library provides XML_StopParser() method that allows the parsing to be stopped from the handler functions. It would be nice to have this option in Python as well, maybe by adding StopParser() method to the XMLParser class.

@nemeskeyd nemeskeyd mannequin added topic-XML type-feature A feature request or enhancement labels Aug 24, 2012
@amauryfa
Copy link
Contributor

If a handler function raises an exception, the Parse() method exits and the exception is propagated; internally, this also calls XML_StopParser().
Why would one call XML_StopParser() explicitely?

@nemeskeyd
Copy link
Mannequin Author

nemeskeyd mannequin commented Aug 28, 2012

OK, then this issue has a "bug" part, too: it is not mentioned in the documentation that exceptions from the handler methods propagate through the Parse() method. I guess the parser can be then stopped in this way too, but it is a dirty method as opposed to calling StopParser().

To answer your question, there are several situations where StopParser() could come in handy. For instance, the XML file might contain records (such as the output of a search engine), from which we only need the first n. Another example would be that reading through the file we realize halfway that e.g. it does not contain the information we need, contains wrong information, etc. so we want to skip the rest of it. Since the file might be huge and since XML parsing can in now way be considered fast, being able to stop the parsing in a clear way would spare the superfluous and possible lengthy computation.

@loewis
Copy link
Mannequin

loewis mannequin commented Aug 28, 2012

nemeskeyd: would you like to work on a patch (for Python 3.4)?

@nemeskeyd
Copy link
Mannequin Author

nemeskeyd mannequin commented Aug 28, 2012

loewis: I don't think it would be difficult to fix, so theoretically I'd be in. However, I don't really have the time to work on this right now.

@amauryfa
Copy link
Contributor

amauryfa commented Sep 5, 2012

Below is a sample script that shows that it's possible to stop parsing XML in the middle, without an explicit call to XML_StopParser(): raise StopParsing from any handler, and catch it around the Parse() call.

This method covers the two proposed use cases. Do we need another way to do it?

import xml.parsers.expat

class StopParsing(Exception):
    pass

def findFirstElementByName(data, what):
  def end_element(name):
      if name == what:
          raise StopParsing(name)

  p = xml.parsers.expat.ParserCreate()
  p.EndElementHandler = end_element

try:
p.Parse(data, True)
except StopParsing as e:
print "Element found:", e
else:
print "Element not found"

data = """<?xml version="1.0"?>
         <parent id="top"><child1 name="paul">Text goes here</child1>
         <child2 name="fred">More text</child2>
         </parent>"""
findFirstElementByName(data, "child2")   # Found
findFirstElementByName(data, "child3")   # Not found

@nemeskeyd
Copy link
Mannequin Author

nemeskeyd mannequin commented Sep 6, 2012

Amaury: see my previous comment. There are two problems with the method you proposed:

  1. It is not mentioned in the documentation that exceptions are propagated through parse().
  2. Exceptions usually mean that an error has happened, and is not the preferred way for flow control (at least this is the policy in other languages e.g. Java, I don't know about Python).

@amauryfa
Copy link
Contributor

amauryfa commented Sep 6, 2012

Your first point is true, even if the Python zen (try "import this")
states that "Errors should never pass silently."

For your second point: exceptions are a common thing in Python code. This is similar to the EAFP principle http://docs.python.org/glossary.html#term-eafp
Also, this example http://docs.python.org/release/2.7.3/library/imp.html#examples shows that exceptions can be part of the normal flow control.

@amauryfa amauryfa added the docs Documentation in the Doc dir label Sep 6, 2012
@loewis
Copy link
Mannequin

loewis mannequin commented Sep 6, 2012

Dávid: Another (similar) example is the Python for loop. In it's original form, it would increase an index and invoke __getitem__ until that *raised* IndexError. In the current definition, it converts the iterated-over object into an iterator, and keeps calling .next until that *raises* StopIteration.

So raising an exception to indicate that something is finished is an established Python idiom.

In any case, I still think adding StopParser is a useful addition, in particular since that would also allow giving True as the "resumable" argument. Any such change needs to be accompanied by also exposing XML_ResumeParser, and possibly XML_GetParsingStatus.

Since we all agree that this is not an important change, I don't mind keeping this issue around until someone comes along to contribute code for it.

@loewis loewis mannequin changed the title Add StopParser() to expat Add StopParser(), ResumeParser, and GetParsingStatus to expat Sep 6, 2012
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir topic-XML type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant