Skip to content

Commit 7c32efc

Browse files
authored
Merge pull request #17203 from RasmusWL/threat-models
Python: Add support for threat models
2 parents 381ea93 + 431a1af commit 7c32efc

File tree

48 files changed

+473
-74
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+473
-74
lines changed

docs/codeql/codeql-language-guides/customizing-library-models-for-python.rst

+8-1
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,7 @@ Kinds
427427
Source kinds
428428
~~~~~~~~~~~~
429429

430-
- **remote**: A generic source of remote flow. Most taint-tracking queries will use such a source. Currently this is the only supported source kind.
430+
See documentation below for :ref:`Threat models <threat-models-python>`.
431431

432432
Sink kinds
433433
~~~~~~~~~~
@@ -449,3 +449,10 @@ Summary kinds
449449

450450
- **taint**: A summary that propagates taint. This means the output is not necessarily equal to the input, but it was derived from the input in an unrestrictive way. An attacker who controls the input will have significant control over the output as well.
451451
- **value**: A summary that preserves the value of the input or creates a copy of the input such that all of its object properties are preserved.
452+
453+
.. _threat-models-python:
454+
455+
Threat models
456+
-------------
457+
458+
.. include:: ../reusables/threat-model-description.rst

docs/codeql/reusables/beta-note-threat-models.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22

33
Note
44

5-
Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java and C# analysis.
5+
Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java, C# and Python analysis.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: feature
3+
---
4+
* Added support for custom threat-models, which can be used in most of our taint-tracking queries, see our [documentation](https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning#extending-codeql-coverage-with-threat-models) for more details.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
extensions:
2+
- addsTo:
3+
pack: codeql/threat-models
4+
extensible: threatModelConfiguration
5+
data:
6+
# Since responses are enabled by default in the shared threat-models configuration,
7+
# we need to disable it here to keep existing behavior for the python analysis.
8+
- ["response", false, -2147483647]

python/ql/lib/qlpack.yml

+2
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,12 @@ dependencies:
99
codeql/dataflow: ${workspace}
1010
codeql/mad: ${workspace}
1111
codeql/regex: ${workspace}
12+
codeql/threat-models: ${workspace}
1213
codeql/tutorial: ${workspace}
1314
codeql/util: ${workspace}
1415
codeql/xml: ${workspace}
1516
codeql/yaml: ${workspace}
1617
dataExtensions:
1718
- semmle/python/frameworks/**/*.model.yml
19+
- ext/*.model.yml
1820
warnOnImplicitThis: true

python/ql/lib/semmle/python/Concepts.qll

+56
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,62 @@ private import semmle.python.dataflow.new.RemoteFlowSources
1010
private import semmle.python.dataflow.new.TaintTracking
1111
private import semmle.python.Frameworks
1212
private import semmle.python.security.internal.EncryptionKeySizes
13+
private import codeql.threatmodels.ThreatModels
14+
15+
/**
16+
* A data flow source, for a specific threat-model.
17+
*
18+
* Extend this class to refine existing API models. If you want to model new APIs,
19+
* extend `ThreatModelSource::Range` instead.
20+
*/
21+
class ThreatModelSource extends DataFlow::Node instanceof ThreatModelSource::Range {
22+
/**
23+
* Gets a string that represents the source kind with respect to threat modeling.
24+
*
25+
* See
26+
* - https://github.com/github/codeql/blob/main/docs/codeql/reusables/threat-model-description.rst
27+
* - https://github.com/github/codeql/blob/main/shared/threat-models/ext/threat-model-grouping.model.yml
28+
*/
29+
string getThreatModel() { result = super.getThreatModel() }
30+
31+
/** Gets a string that describes the type of this threat-model source. */
32+
string getSourceType() { result = super.getSourceType() }
33+
}
34+
35+
/** Provides a class for modeling new sources for specific threat-models. */
36+
module ThreatModelSource {
37+
/**
38+
* A data flow source, for a specific threat-model.
39+
*
40+
* Extend this class to model new APIs. If you want to refine existing API models,
41+
* extend `ThreatModelSource` instead.
42+
*/
43+
abstract class Range extends DataFlow::Node {
44+
/**
45+
* Gets a string that represents the source kind with respect to threat modeling.
46+
*
47+
* See
48+
* - https://github.com/github/codeql/blob/main/docs/codeql/reusables/threat-model-description.rst
49+
* - https://github.com/github/codeql/blob/main/shared/threat-models/ext/threat-model-grouping.model.yml
50+
*/
51+
abstract string getThreatModel();
52+
53+
/** Gets a string that describes the type of this threat-model source. */
54+
abstract string getSourceType();
55+
}
56+
}
57+
58+
/**
59+
* A data flow source that is enabled in the current threat model configuration.
60+
*/
61+
class ActiveThreatModelSource extends ThreatModelSource {
62+
ActiveThreatModelSource() {
63+
exists(string kind |
64+
currentThreatModel(kind) and
65+
this.getThreatModel() = kind
66+
)
67+
}
68+
}
1369

1470
/**
1571
* A data-flow node that executes an operating system command,

python/ql/lib/semmle/python/dataflow/new/RemoteFlowSources.qll

+3-7
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,7 @@ private import semmle.python.Concepts
1515
* Extend this class to refine existing API models. If you want to model new APIs,
1616
* extend `RemoteFlowSource::Range` instead.
1717
*/
18-
class RemoteFlowSource extends DataFlow::Node instanceof RemoteFlowSource::Range {
19-
/** Gets a string that describes the type of this remote flow source. */
20-
string getSourceType() { result = super.getSourceType() }
21-
}
18+
class RemoteFlowSource extends ThreatModelSource instanceof RemoteFlowSource::Range { }
2219

2320
/** Provides a class for modeling new sources of remote user input. */
2421
module RemoteFlowSource {
@@ -28,8 +25,7 @@ module RemoteFlowSource {
2825
* Extend this class to model new APIs. If you want to refine existing API models,
2926
* extend `RemoteFlowSource` instead.
3027
*/
31-
abstract class Range extends DataFlow::Node {
32-
/** Gets a string that describes the type of this remote flow source. */
33-
abstract string getSourceType();
28+
abstract class Range extends ThreatModelSource::Range {
29+
override string getThreatModel() { result = "remote" }
3430
}
3531
}

python/ql/lib/semmle/python/frameworks/PEP249.qll

+18
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,24 @@ module PEP249 {
8181
}
8282
}
8383

84+
/** A call to a method that fetches rows from a previous execution. */
85+
private class FetchMethodCall extends ThreatModelSource::Range, API::CallNode {
86+
FetchMethodCall() {
87+
exists(API::Node start |
88+
start instanceof DatabaseCursor or start instanceof DatabaseConnection
89+
|
90+
// note: since we can't currently provide accesspaths for sources, these are all
91+
// lumped together, although clearly the fetchmany/fetchall returns a
92+
// list/iterable with rows.
93+
this = start.getMember(["fetchone", "fetchmany", "fetchall"]).getACall()
94+
)
95+
}
96+
97+
override string getThreatModel() { result = "database" }
98+
99+
override string getSourceType() { result = "cursor.fetch*()" }
100+
}
101+
84102
// ---------------------------------------------------------------------------
85103
// asyncio implementations
86104
// ---------------------------------------------------------------------------
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
extensions:
2+
- addsTo:
3+
pack: codeql/python-all
4+
extensible: sourceModel
5+
data:
6+
- ['os', 'Member[getenv].ReturnValue', 'environment']
7+
- ['os', 'Member[getenvb].ReturnValue', 'environment']
8+
- ['os', 'Member[environ]', 'environment']
9+
- ['os', 'Member[environb]', 'environment']
10+
- ['posix', 'Member[environ]', 'environment']
11+
12+
- ['sys', 'Member[argv]', 'commandargs']
13+
- ['sys', 'Member[orig_argv]', 'commandargs']
14+
15+
- ['sys', 'Member[stdin]', 'stdin']
16+
- ['builtins', 'Member[input].ReturnValue', 'stdin']
17+
- ['builtins', 'Member[raw_input].ReturnValue', 'stdin'] # python 2 only
18+
19+
20+
# if no argument is given, the default is to use sys.argv[1:]
21+
- ['argparse.ArgumentParser', 'Member[parse_args,parse_known_args].WithArity[0].ReturnValue', 'commandargs']
22+
23+
- ['os', 'Member[read].ReturnValue', 'file']
24+
- addsTo:
25+
pack: codeql/python-all
26+
extensible: summaryModel
27+
data:
28+
- ['argparse.ArgumentParser', 'Member[parse_args,parse_known_args]', 'Argument[0,args:]', 'ReturnValue', 'taint']
29+
# note: taint of attribute lookups is handled in QL

python/ql/lib/semmle/python/frameworks/Stdlib.qll

+46-4
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,7 @@ module StdlibPrivate {
338338
* Modeling of path related functions in the `os` module.
339339
* Wrapped in QL module to make it easy to fold/unfold.
340340
*/
341-
private module OsFileSystemAccessModeling {
341+
module OsFileSystemAccessModeling {
342342
/**
343343
* A call to the `os.fsencode` function.
344344
*
@@ -395,7 +395,7 @@ module StdlibPrivate {
395395
*
396396
* See https://docs.python.org/3/library/os.html#os.open
397397
*/
398-
private class OsOpenCall extends FileSystemAccess::Range, DataFlow::CallCfgNode {
398+
class OsOpenCall extends FileSystemAccess::Range, DataFlow::CallCfgNode {
399399
OsOpenCall() { this = os().getMember("open").getACall() }
400400

401401
override DataFlow::Node getAPathArgument() {
@@ -1499,13 +1499,22 @@ module StdlibPrivate {
14991499
* See https://docs.python.org/3/library/functions.html#open
15001500
*/
15011501
private class OpenCall extends FileSystemAccess::Range, Stdlib::FileLikeObject::InstanceSource,
1502-
DataFlow::CallCfgNode
1502+
ThreatModelSource::Range, DataFlow::CallCfgNode
15031503
{
1504-
OpenCall() { this = getOpenFunctionRef().getACall() }
1504+
OpenCall() {
1505+
this = getOpenFunctionRef().getACall() and
1506+
// when analyzing stdlib code for os.py we wrongly assume that `os.open` is an
1507+
// alias of the builtins `open` function
1508+
not this instanceof OsFileSystemAccessModeling::OsOpenCall
1509+
}
15051510

15061511
override DataFlow::Node getAPathArgument() {
15071512
result in [this.getArg(0), this.getArgByName("file")]
15081513
}
1514+
1515+
override string getThreatModel() { result = "file" }
1516+
1517+
override string getSourceType() { result = "open()" }
15091518
}
15101519

15111520
/**
@@ -4989,6 +4998,39 @@ module StdlibPrivate {
49894998

49904999
override string getKind() { result = Escaping::getHtmlKind() }
49915000
}
5001+
5002+
// ---------------------------------------------------------------------------
5003+
// argparse
5004+
// ---------------------------------------------------------------------------
5005+
/**
5006+
* if result of `parse_args` is tainted (because it uses command-line arguments),
5007+
* then the parsed values accesssed on any attribute lookup is also tainted.
5008+
*/
5009+
private class ArgumentParserAnyAttributeStep extends TaintTracking::AdditionalTaintStep {
5010+
override predicate step(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
5011+
nodeFrom =
5012+
API::moduleImport("argparse")
5013+
.getMember("ArgumentParser")
5014+
.getReturn()
5015+
.getMember("parse_args")
5016+
.getReturn()
5017+
.getAValueReachableFromSource() and
5018+
nodeTo.(DataFlow::AttrRead).getObject() = nodeFrom
5019+
}
5020+
}
5021+
5022+
// ---------------------------------------------------------------------------
5023+
// sys
5024+
// ---------------------------------------------------------------------------
5025+
/**
5026+
* An access of `sys.stdin`/`sys.stdout`/`sys.stderr`, to get additional FileLike
5027+
* modeling.
5028+
*/
5029+
private class SysStandardStreams extends Stdlib::FileLikeObject::InstanceSource, DataFlow::Node {
5030+
SysStandardStreams() {
5031+
this = API::moduleImport("sys").getMember(["stdin", "stdout", "stderr"]).asSource()
5032+
}
5033+
}
49925034
}
49935035

49945036
// ---------------------------------------------------------------------------

python/ql/lib/semmle/python/frameworks/data/ModelsAsData.qll

+9-4
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,19 @@ private import semmle.python.dataflow.new.RemoteFlowSources
1818
private import semmle.python.dataflow.new.DataFlow
1919
private import semmle.python.ApiGraphs
2020
private import semmle.python.dataflow.new.FlowSummary
21+
private import semmle.python.Concepts
2122

2223
/**
23-
* A remote flow source originating from a CSV source row.
24+
* A threat-model flow source originating from a data extension.
2425
*/
25-
private class RemoteFlowSourceFromCsv extends RemoteFlowSource::Range {
26-
RemoteFlowSourceFromCsv() { this = ModelOutput::getASourceNode("remote").asSource() }
26+
private class ThreatModelSourceFromDataExtension extends ThreatModelSource::Range {
27+
ThreatModelSourceFromDataExtension() { this = ModelOutput::getASourceNode(_).asSource() }
2728

28-
override string getSourceType() { result = "Remote flow (from model)" }
29+
override string getThreatModel() { this = ModelOutput::getASourceNode(result).asSource() }
30+
31+
override string getSourceType() {
32+
result = "Source node (" + this.getThreatModel() + ") [from data-extension]"
33+
}
2934
}
3035

3136
private class SummarizedCallableFromModel extends SummarizedCallable {

python/ql/lib/semmle/python/security/dataflow/CodeInjectionCustomizations.qll

+7-2
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,14 @@ module CodeInjection {
3333
abstract class Sanitizer extends DataFlow::Node { }
3434

3535
/**
36-
* A source of remote user input, considered as a flow source.
36+
* DEPRECATED: Use `ActiveThreatModelSource` from Concepts instead!
3737
*/
38-
class RemoteFlowSourceAsSource extends Source, RemoteFlowSource { }
38+
deprecated class RemoteFlowSourceAsSource = ActiveThreatModelSourceAsSource;
39+
40+
/**
41+
* An active threat-model source, considered as a flow source.
42+
*/
43+
private class ActiveThreatModelSourceAsSource extends Source, ActiveThreatModelSource { }
3944

4045
/**
4146
* A code execution, considered as a flow sink.

python/ql/lib/semmle/python/security/dataflow/CommandInjectionCustomizations.qll

+7-2
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,14 @@ module CommandInjection {
3333
abstract class Sanitizer extends DataFlow::Node { }
3434

3535
/**
36-
* A source of remote user input, considered as a flow source.
36+
* DEPRECATED: Use `ActiveThreatModelSource` from Concepts instead!
3737
*/
38-
class RemoteFlowSourceAsSource extends Source, RemoteFlowSource { }
38+
deprecated class RemoteFlowSourceAsSource = ActiveThreatModelSourceAsSource;
39+
40+
/**
41+
* An active threat-model source, considered as a flow source.
42+
*/
43+
private class ActiveThreatModelSourceAsSource extends Source, ActiveThreatModelSource { }
3944

4045
/**
4146
* A command execution, considered as a flow sink.

python/ql/lib/semmle/python/security/dataflow/CookieInjectionCustomizations.qll

+7-2
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,14 @@ module CookieInjection {
3131
abstract class Sanitizer extends DataFlow::Node { }
3232

3333
/**
34-
* A source of remote user input, considered as a flow source.
34+
* DEPRECATED: Use `ActiveThreatModelSource` from Concepts instead!
3535
*/
36-
class RemoteFlowSourceAsSource extends Source, RemoteFlowSource { }
36+
deprecated class RemoteFlowSourceAsSource = ActiveThreatModelSourceAsSource;
37+
38+
/**
39+
* An active threat-model source, considered as a flow source.
40+
*/
41+
private class ActiveThreatModelSourceAsSource extends Source, ActiveThreatModelSource { }
3742

3843
/**
3944
* A write to a cookie, considered as a sink.

python/ql/lib/semmle/python/security/dataflow/HttpHeaderInjectionCustomizations.qll

+7-2
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,14 @@ module HttpHeaderInjection {
3232
abstract class Sanitizer extends DataFlow::Node { }
3333

3434
/**
35-
* A source of remote user input, considered as a flow source.
35+
* DEPRECATED: Use `ActiveThreatModelSource` from Concepts instead!
3636
*/
37-
class RemoteFlowSourceAsSource extends Source, RemoteFlowSource { }
37+
deprecated class RemoteFlowSourceAsSource = ActiveThreatModelSourceAsSource;
38+
39+
/**
40+
* An active threat-model source, considered as a flow source.
41+
*/
42+
private class ActiveThreatModelSourceAsSource extends Source, ActiveThreatModelSource { }
3843

3944
/**
4045
* A HTTP header write, considered as a flow sink.

python/ql/lib/semmle/python/security/dataflow/LdapInjectionCustomizations.qll

+7-2
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,14 @@ module LdapInjection {
4242
abstract class FilterSanitizer extends DataFlow::Node { }
4343

4444
/**
45-
* A source of remote user input, considered as a flow source.
45+
* DEPRECATED: Use `ActiveThreatModelSource` from Concepts instead!
4646
*/
47-
class RemoteFlowSourceAsSource extends Source, RemoteFlowSource { }
47+
deprecated class RemoteFlowSourceAsSource = ActiveThreatModelSourceAsSource;
48+
49+
/**
50+
* An active threat-model source, considered as a flow source.
51+
*/
52+
private class ActiveThreatModelSourceAsSource extends Source, ActiveThreatModelSource { }
4853

4954
/**
5055
* A logging operation, considered as a flow sink.

0 commit comments

Comments
 (0)