Skip to content

Python: Model additional flow steps for the lxml framework #18185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 10, 2025

Conversation

joefarebrother
Copy link
Contributor

Models additional flow steps for Element and ElementTree objects from the lxml.etree module, to track taint through parsed XML elements.

@github-actions github-actions bot added the Python label Dec 2, 2024
@joefarebrother joefarebrother force-pushed the python-lxml branch 2 times, most recently from 652a9ca to b523be2 Compare December 11, 2024 12:03
@joefarebrother joefarebrother marked this pull request as ready for review December 11, 2024 14:29
@Copilot Copilot AI review requested due to automatic review settings December 11, 2024 14:29
@joefarebrother joefarebrother requested a review from a team as a code owner December 11, 2024 14:29
@joefarebrother joefarebrother changed the title [Draft] Python: Model additional flow steps for the lxml framework Python: Model additional flow steps for the lxml framework Dec 11, 2024
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 5 changed files in this pull request and generated no suggestions.

Files not reviewed (3)
  • python/ql/lib/semmle/python/frameworks/Lxml.qll: Language not supported
  • python/ql/test/library-tests/frameworks/lxml/InlineTaintTest.expected: Language not supported
  • python/ql/test/library-tests/frameworks/lxml/InlineTaintTest.ql: Language not supported
Comments skipped due to low confidence (16)

python/ql/test/library-tests/frameworks/lxml/taint_test.py:50

  • The taint tracking is missing through the call to list. Ensure the test case covers this scenario to verify taint propagation.
list(elem.findall("b"))[0].text,  # $ MISSING:tainted # Type tracking is not followed through call to `list`

python/ql/test/library-tests/frameworks/lxml/taint_test.py:54

  • The taint tracking is missing through the call to next on elem.getiterator(). Ensure the test case covers this scenario to verify taint propagation.
next(elem.getiterator()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:62

  • The taint tracking is missing through the call to next on elem.iter(). Ensure the test case covers this scenario to verify taint propagation.
next(elem.iter()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:64

  • The taint tracking is missing through the call to next on elem[0].iterancestors(). Ensure the test case covers this scenario to verify taint propagation.
next(elem[0].iterancestors()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:66

  • The taint tracking is missing through the call to next on elem.iterchildren(). Ensure the test case covers this scenario to verify taint propagation.
next(elem.iterchildren()).text, # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:68

  • The taint tracking is missing through the call to next on elem.iterdescendants(). Ensure the test case covers this scenario to verify taint propagation.
next(elem.iterdescendants()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:70

  • The taint tracking is missing through the call to next on elem.iterfind("b"). Ensure the test case covers this scenario to verify taint propagation.
next(elem.iterfind("b")).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:72

  • The taint tracking is missing through the call to next on elem[0].itersiblings(). Ensure the test case covers this scenario to verify taint propagation.
next(elem[0].itersiblings()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:82

  • The taint tracking is missing for ch.text in the for loop. Ensure the test case covers this scenario to verify taint propagation.
ch.text  # $ MISSING: tainted # API node getASubscript() appears to not consider things like for loops

python/ql/test/library-tests/frameworks/lxml/taint_test.py:93

  • The taint tracking is missing through the call to next on tree.getiterator(). Ensure the test case covers this scenario to verify taint propagation.
next(tree.getiterator()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:95

  • The taint tracking is missing through the call to next on tree.iter(). Ensure the test case covers this scenario to verify taint propagation.
next(tree.iter()).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:97

  • The taint tracking is missing through the call to next on tree.iterfind("b"). Ensure the test case covers this scenario to verify taint propagation.
next(tree.iterfind("b")).text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:109

  • The taint tracking is missing for elem2.text from the tuple return value. Ensure the test case covers this scenario to verify taint propagation.
elem2.text,  # $ MISSING:tainted # Type is not tracked to the tuple return value

python/ql/test/library-tests/frameworks/lxml/taint_test.py:110

  • The taint tracking is missing for elem3.text from the tuple return value. Ensure the test case covers this scenario to verify taint propagation.
elem3.text,  # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:119

  • The taint tracking is missing for tree2.getroot(). Ensure the test case covers this scenario to verify taint propagation.
tree2.getroot() # $ MISSING:tainted

python/ql/test/library-tests/frameworks/lxml/taint_test.py:126

  • The taint tracking is missing for el.text in the iterparse loop. Ensure the test case covers this scenario to verify taint propagation.
el.text, # $ MISSING:tainted

Tip: Copilot only keeps its highest confidence comments to reduce noise and keep you focused. Learn more

yoff
yoff previously approved these changes Jan 6, 2025
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice modeling, and great use of the API graph. I like the tests also, especially the ones about what type tracking can and cannot do. I think we should document this more clearly. Currently, there are some comments to that effect in the type tracking tests.

override string getFormat() { result = "XML" }
}
// TODO: ElementTree.write can write to a file-like object; should that be a flow step?
// It also can accept a filepath which could be a path injection sink.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can add that as a sink directly by extending the right concept? (Probably out of scope for this PR)

elem.get("at"), # $ tainted
elem.getchildren()[0].text, # $ tainted
elem.getiterator(), # $ tainted
next(elem.getiterator()).text, # $ MISSING:tainted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next is currently modeled as a flow summary, but not a "simple" one, so we cannot type track through it (it uses type tracking to identify the calls).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything we could expect to be able to type track through these cases of methods returning iterators?
(list(X)[0] doesn't work; and neither do for loops as one of the other tests here shows)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it looks like all uses of getACallSimple are in test summaries, so probably not..


ensure_tainted(
func2(tree), # $ tainted
func2(tree).text, # $ MISSING:tainted - type tracking not tracked through flow preserving calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising, should that be func2(tree).getroot().text?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should; fixed. However the flow is still missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, we should probably look into that at some point..

func2(tree).text, # $ MISSING:tainted - type tracking not tracked through flow preserving calls
func3(tree).text, # $ MISSING:tainted - this includes if there is a type hint annotation on the return
typing.cast(ET.ElementTree, tree), # $ tainted
typing.cast(ET.ElementTree, tree).text, # $ MISSING:tainted - this includes for flow summary models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will only work for "simple" summaries; those that do not use type tracking to identify calls. (And, again, is .text the right test here? I do not see tree.text being tainted.)

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joefarebrother joefarebrother merged commit a7fb73a into github:main Jan 10, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants