Skip to content

Commit 26b75d5

Browse files
authored
Merge pull request #4422 from tybug/observability-choice-nodes
Add `observation.metadata.choice_nodes`
2 parents fec47d6 + 076510d commit 26b75d5

File tree

16 files changed

+598
-25
lines changed

16 files changed

+598
-25
lines changed

hypothesis-python/RELEASE.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
RELEASE_TYPE: patch
2+
3+
This release adds the experimental and unstable |OBSERVABILITY_CHOICES| option for :ref:`observability <observability>`. If set, the choice sequence is included in ``metadata.choice_nodes``, and choice sequence spans are included in ``metadata.choice_spans``.
4+
5+
These are relatively low-level implementation detail of Hypothesis, and are exposed in observability for users building tools or research on top of Hypothesis. See |PrimitiveProvider| for more details about the choice sequence and choice spans.
6+
7+
We are actively working towards a better interface for this. Feel free to use |OBSERVABILITY_CHOICES| to experiment, but don't rely on it yet!

hypothesis-python/docs/prolog.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,17 +116,23 @@
116116
.. |PrimitiveProvider.draw_string| replace:: :func:`~hypothesis.internal.conjecture.providers.PrimitiveProvider.draw_string`
117117
.. |PrimitiveProvider.draw_bytes| replace:: :func:`~hypothesis.internal.conjecture.providers.PrimitiveProvider.draw_bytes`
118118
.. |PrimitiveProvider.on_observation| replace:: :func:`~hypothesis.internal.conjecture.providers.PrimitiveProvider.on_observation`
119+
.. |PrimitiveProvider.observe_test_case| replace:: :func:`~hypothesis.internal.conjecture.providers.PrimitiveProvider.observe_test_case`
120+
.. |PrimitiveProvider.observe_information_messages| replace:: :func:`~hypothesis.internal.conjecture.providers.PrimitiveProvider.observe_information_messages`
119121
.. |PrimitiveProvider.per_test_case_context_manager| replace:: :func:`~hypothesis.internal.conjecture.providers.PrimitiveProvider.per_test_case_context_manager`
120122
.. |PrimitiveProvider.add_observability_callback| replace:: :data:`~hypothesis.internal.conjecture.providers.PrimitiveProvider.add_observability_callback`
123+
.. |PrimitiveProvider.span_start| replace:: :func:`~hypothesis.internal.conjecture.providers.PrimitiveProvider.span_start`
124+
.. |PrimitiveProvider.span_end| replace:: :func:`~hypothesis.internal.conjecture.providers.PrimitiveProvider.span_end`
121125

122126
.. |AVAILABLE_PROVIDERS| replace:: :data:`~hypothesis.internal.conjecture.providers.AVAILABLE_PROVIDERS`
123127
.. |TESTCASE_CALLBACKS| replace:: :data:`~hypothesis.internal.observability.TESTCASE_CALLBACKS`
128+
.. |OBSERVABILITY_CHOICES| replace:: :data:`~hypothesis.internal.observability.OBSERVABILITY_CHOICES`
124129
.. |BUFFER_SIZE| replace:: :data:`~hypothesis.internal.conjecture.engine.BUFFER_SIZE`
125130
.. |MAX_SHRINKS| replace:: :data:`~hypothesis.internal.conjecture.engine.MAX_SHRINKS`
126131
.. |MAX_SHRINKING_SECONDS| replace:: :data:`~hypothesis.internal.conjecture.engine.MAX_SHRINKING_SECONDS`
127132
.. |BackendCannotProceed| replace:: :exc:`~hypothesis.errors.BackendCannotProceed`
128133

129134
.. |@rule| replace:: :func:`@rule <hypothesis.stateful.rule>`
135+
.. |@precondition| replace:: :func:`@precondition <hypothesis.stateful.precondition>`
130136
.. |RuleBasedStateMachine| replace:: :class:`~hypothesis.stateful.RuleBasedStateMachine`
131137
.. |run_state_machine_as_test| replace:: :func:`~hypothesis.stateful.run_state_machine_as_test`
132138

hypothesis-python/docs/reference/integrations.rst

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,11 +156,33 @@ which includes infinities and NaN. This is valid in `JSON5 <https://json5.org/>
156156
and supported by `some JSON parsers <https://evanhahn.com/pythons-nonstandard-json-encoding/>`__
157157
including Gson in Java, ``JSON.parse()`` in Ruby, and of course in Python.
158158

159+
Information message
160+
^^^^^^^^^^^^^^^^^^^
161+
162+
.. jsonschema:: ./schema_observations.json#/oneOf/1
163+
:hide_key: /additionalProperties, /type
164+
165+
Test case
166+
^^^^^^^^^
167+
159168
.. jsonschema:: ./schema_observations.json#/oneOf/0
160169
:hide_key: /additionalProperties, /type
161-
.. jsonschema:: ./schema_observations.json#/oneOf/1
170+
171+
Hypothesis metadata
172+
+++++++++++++++++++
173+
174+
While the observability format is agnostic to the property-based testing library which generated it, Hypothesis includes specific values in the ``metadata`` key for test cases. You may rely on these being present if and only if the observation was generated by Hypothesis.
175+
176+
.. jsonschema:: ./schema_metadata.json
162177
:hide_key: /additionalProperties, /type
163178

179+
Choices metadata
180+
++++++++++++++++
181+
182+
These additional metadata elements are included in ``metadata`` (as e.g. ``metadata["choice_nodes"]`` or ``metadata["choice_spans"]``), if and only if |OBSERVABILITY_CHOICES| is set.
183+
184+
.. jsonschema:: ./schema_metadata_choices.json
185+
:hide_key: /additionalProperties, /type
164186

165187
.. _pytest-plugin:
166188

hypothesis-python/docs/reference/internals.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Observability
3232

3333
.. autodata:: hypothesis.internal.observability.TESTCASE_CALLBACKS
3434
.. autodata:: hypothesis.internal.observability.OBSERVABILITY_COLLECT_COVERAGE
35-
35+
.. autodata:: hypothesis.internal.observability.OBSERVABILITY_CHOICES
3636

3737
Engine constants
3838
----------------
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
{
2+
"type": "object",
3+
"properties": {
4+
"traceback": {
5+
"type": ["string", "null"],
6+
"description": "The traceback for failing tests, if and only if ``status == \"failed\"``."
7+
},
8+
"reproduction_decorator": {
9+
"type": ["string", "null"],
10+
"description": "The ``@reproduce_failure`` decorator string for failing tests, if and only if ``status == \"failed\"``."
11+
},
12+
"predicates": {
13+
"type": "object",
14+
"description": "The number of times each |assume| and |@precondition| predicate was satisfied (``True``) and not satisfied (``False``).",
15+
"additionalProperties": {
16+
"type": "object",
17+
"properties": {
18+
"satisfied": {
19+
"type": "integer",
20+
"minimum": 0,
21+
"description": "The number of times this predicate was satisfied (``True``)."
22+
},
23+
"unsatisfied": {
24+
"type": "integer",
25+
"minimum": 0,
26+
"description": "The number of times this predicate was not satisfied (``False``)."
27+
}
28+
},
29+
"required": ["satisfied", "unsatisfied"],
30+
"additionalProperties": false
31+
}
32+
},
33+
"backend": {
34+
"type": "object",
35+
"description": "Backend-specific observations from |PrimitiveProvider.observe_test_case| and |PrimitiveProvider.observe_information_messages|."
36+
},
37+
"sys.argv": {
38+
"type": "array",
39+
"items": {"type": "string"},
40+
"description": "The result of ``sys.argv``."
41+
},
42+
"os.getpid()": {
43+
"type": "integer",
44+
"description": "The result of ``os.getpid()``."
45+
},
46+
"imported_at": {
47+
"type": "number",
48+
"description": "The unix timestamp when Hypothesis was imported."
49+
},
50+
"data_status": {
51+
"type": "number",
52+
"enum": [0, 1, 2, 3],
53+
"description": "The internal status of the ConjectureData for this test case. The values are as follows: ``Status.OVERRUN = 0``, ``Status.INVALID = 1``, ``Status.VALID = 2``, and ``Status.INTERESTING = 3``."
54+
},
55+
"interesting_origin": {
56+
"type": ["string", "null"],
57+
"description": "The internal ``InterestingOrigin`` object for failing tests, if and only if ``status == \"failed\"``. The ``traceback`` string value is derived from this object."
58+
}
59+
},
60+
"required": ["traceback", "reproduction_decorator", "predicates", "backend", "sys.argv", "os.getpid()", "imported_at", "data_status", "interesting_origin"],
61+
"additionalProperties": false
62+
}
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
{
2+
"type": "object",
3+
"properties": {
4+
"choice_nodes": {
5+
"type": ["array", "null"],
6+
"description": ".. warning::\n\n EXPERIMENTAL AND UNSTABLE. This attribute may change format or disappear without warning.\n\nThe sequence of choices made during this test case. This includes the choice value, as well as its constraints and whether it was forced or not.\n\nOnly present if |OBSERVABILITY_CHOICES| is ``True``.\n\n.. note::\n\n The choice sequence is a relatively low-level implementation detail of Hypothesis, and is exposed in observability for users building tools or research on top of Hypothesis. See |PrimitiveProvider| for more details about the choice sequence.",
7+
"items": {
8+
"type": "object",
9+
"properties": {
10+
"type": {
11+
"type": "string",
12+
"enum": ["integer", "float", "string", "bytes", "boolean"],
13+
"description": "The type of choice made. Corresponds to a call to |PrimitiveProvider.draw_integer|, |PrimitiveProvider.draw_float|, |PrimitiveProvider.draw_string|, |PrimitiveProvider.draw_bytes|, or |PrimitiveProvider.draw_boolean|."
14+
},
15+
"value": {
16+
"description": "The value of the choice. Corresponds to the value returned by a ``PrimitiveProvider.draw_*`` method.\n\n``NaN`` float values are returned as ``[\"float\", <float64_int_value>]``, to distinguish ``NaN`` floats with nonstandard bit patterns. Integers with ``abs(value) >= 2**63`` are returned as ``[\"integer\", str(value)]``, for compatibility with tools with integer size limitations. Bytes are returned as ``[\"bytes\", base64.b64encode(value)]``."
17+
},
18+
"constraints": {
19+
"type": "object",
20+
"description": "The constraints for this choice. Corresponds to the constraints passed to a ``PrimitiveProvider.draw_*`` method. ``NaN`` float values, integers with ``abs(value) >= 2**63``, and byte values for constraints are transformed as for the ``value`` attribute."
21+
},
22+
"was_forced": {
23+
"type": "boolean",
24+
"description": "Whether this choice was forced. As an implementation detail, Hypothesis occasionally requires that some choices take on a specific value, for instance to end generation of collection elements early for performance. These values are called \"forced\", and have ``was_forced = True``."
25+
}
26+
},
27+
"required": ["type", "value", "constraints", "was_forced"],
28+
"additionalProperties": false
29+
}
30+
},
31+
"choice_spans": {
32+
"type": "array",
33+
"items": {"type": "array"},
34+
"description": ".. warning::\n\n EXPERIMENTAL AND UNSTABLE. This attribute may change format or disappear without warning.\n\nThe semantically-meaningful spans of the choice sequence of this test case.\n\nEach span has the format ``[label, start, end, discarded]``, where:\n\n* ``label`` is an opaque integer-value string shared by all spans drawn from a particular strategy.\n* ``start`` and ``end`` are indices into the choice sequence for this span, such that ``choices[start:end]`` are the corresponding choices.\n* ``discarded`` is a boolean indicating whether this span was discarded (see |PrimitiveProvider.span_end|).\n\nOnly present if |OBSERVABILITY_CHOICES| is ``True``.\n\n.. note::\n\n Spans are a relatively low-level implementation detail of Hypothesis, and are exposed in observability for users building tools or research on top of Hypothesis. See |PrimitiveProvider| (and particularly |PrimitiveProvider.span_start| and |PrimitiveProvider.span_end|) for more details about spans."
35+
}
36+
},
37+
"required": ["traceback", "reproduction_decorator", "predicates", "backend", "sys.argv", "os.getpid()", "imported_at", "data_status", "interesting_origin", "choice_nodes", "choice_spans"],
38+
"additionalProperties": false
39+
}

hypothesis-python/docs/reference/schema_observations.json

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
"description": "PBT Observations define a standard way to communicate what happened when property-based tests were run. They describe test cases, or general notifications classified as info, alert, or error messages.",
44
"oneOf": [
55
{
6-
"title": "Test case",
76
"description": "Describes the inputs to and result of running some test function on a particular input. The test might have passed, failed, or been abandoned part way through (e.g. because we failed a |.filter| condition).",
87
"type": "object",
98
"properties": {
@@ -69,7 +68,6 @@
6968
"additionalProperties": false
7069
},
7170
{
72-
"title": "Information message",
7371
"description": "Info, alert, and error messages correspond to a group of test cases or the overall run, and are intended for humans rather than machine analysis.",
7472
"type": "object",
7573
"properties": {

hypothesis-python/src/hypothesis/core.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -675,6 +675,7 @@ def execute_explicit_examples(state, wrapped_test, arguments, kwargs, original_s
675675
"Falsifying example", "Falsifying explicit example", 1
676676
)
677677

678+
empty_data.freeze()
678679
tc = make_testcase(
679680
run_start=state._start_timestamp,
680681
property=state.test_identifier,
@@ -1310,6 +1311,7 @@ def _execute_once_for_engine(self, data: ConjectureData) -> None:
13101311
data._observability_args = {}
13111312
self._string_repr = "<backend failed to realize symbolic arguments>"
13121313

1314+
data.freeze()
13131315
tc = make_testcase(
13141316
run_start=self._start_timestamp,
13151317
property=self.test_identifier,
@@ -1506,6 +1508,7 @@ def run_engine(self):
15061508
# execute_once() will always raise either the expected error, or Flaky.
15071509
raise NotImplementedError("This should be unreachable")
15081510
finally:
1511+
ran_example.freeze()
15091512
# log our observability line for the final failing example
15101513
tc = make_testcase(
15111514
run_start=self._start_timestamp,
@@ -1529,11 +1532,7 @@ def run_engine(self):
15291532
f"{reproduction_decorator(falsifying_example.choices)} "
15301533
"as a decorator on your test case"
15311534
)
1532-
# Mostly useful for ``find`` and ensuring that objects that
1533-
# hold on to a reference to ``data`` know that it's now been
1534-
# finished and they can't draw more data from it.
1535-
ran_example.freeze() # pragma: no branch
1536-
# No branch is possible here because we never have an active exception.
1535+
15371536
_raise_to_user(
15381537
errors_to_report,
15391538
self.settings,
@@ -2104,6 +2103,7 @@ def fuzz_one_input(
21042103
raise
21052104
finally:
21062105
if TESTCASE_CALLBACKS:
2106+
data.freeze()
21072107
tc = make_testcase(
21082108
run_start=state._start_timestamp,
21092109
property=state.test_identifier,

hypothesis-python/src/hypothesis/internal/conjecture/providers.py

Lines changed: 54 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -596,12 +596,59 @@ def on_observation(self, observation: TestCaseObservation) -> None: # noqa: B02
596596
def span_start(self, label: int, /) -> None: # noqa: B027 # non-abstract noop
597597
"""Marks the beginning of a semantically meaningful span of choices.
598598
599-
Providers can optionally track this data to learn which sub-sequences
600-
of draws correspond to a higher-level object, recovering the parse tree.
601-
``label`` is an opaque integer, which will be shared by all spans drawn
602-
from a particular strategy.
599+
Spans are a depth-first tree structure. A span is opened by a call to
600+
|PrimitiveProvider.span_start|, and a call to |PrimitiveProvider.span_end|
601+
closes the most recently opened span. So the following sequence of calls:
603602
604-
This method is called from ``ConjectureData.start_span()``.
603+
.. code-block:: python
604+
605+
span_start(label=1)
606+
n1 = draw_integer()
607+
span_start(label=2)
608+
b1 = draw_boolean()
609+
n2 = draw_integer()
610+
span_end()
611+
f1 = draw_float()
612+
span_end()
613+
614+
produces the following two spans of choices:
615+
616+
.. code-block::
617+
618+
1: [n1, b1, n2, f1]
619+
2: [b1, n2]
620+
621+
Hypothesis uses spans to denote "semantically meaningful" sequences of
622+
choices. For instance, Hypothesis opens a span for the sequence of choices
623+
made while drawing from each strategy. Not every span corresponds to a
624+
strategy; the generation of e.g. each element in |st.lists| is also marked
625+
with a span, among others.
626+
627+
``label`` is an opaque integer, which has no defined semantics.
628+
The only guarantee made by Hypothesis is that all spans with the same
629+
"meaning" will share the same ``label``. So all spans from the same
630+
strategy will share the same label, as will e.g. the spans for |st.lists|
631+
elements.
632+
633+
Providers can track calls to |PrimitiveProvider.span_start| and
634+
|PrimitiveProvider.span_end| to learn something about the semantics of
635+
the test's choice sequence. For instance, a provider could track the depth
636+
of the span tree, or the number of unique labels, which says something about
637+
the complexity of the choices being generated. Or a provider could track
638+
the span tree across test cases in order to determine what strategies are
639+
being used in what contexts.
640+
641+
It is possible for Hypothesis to start and immediately stop a span,
642+
without calling a ``draw_*`` method in between. These spans contain zero
643+
choices.
644+
645+
Hypothesis will always balance the number of calls to
646+
|PrimitiveProvider.span_start| and |PrimitiveProvider.span_end|. A call
647+
to |PrimitiveProvider.span_start| will always be followed by a call to
648+
|PrimitiveProvider.span_end| before the end of the test case.
649+
650+
|PrimitiveProvider.span_start| is called from ``ConjectureData.start_span()``
651+
internally.
605652
"""
606653

607654
def span_end(self, discard: bool, /) -> None: # noqa: B027, FBT001
@@ -611,7 +658,8 @@ def span_end(self, discard: bool, /) -> None: # noqa: B027, FBT001
611658
as unlikely to contribute to the input data as seen by the user's test.
612659
Note however that side effects can make this determination unsound.
613660
614-
This method is called from ``ConjectureData.stop_span()``.
661+
|PrimitiveProvider.span_end| is called from ``ConjectureData.stop_span()``
662+
internally.
615663
"""
616664

617665

0 commit comments

Comments
 (0)