-
Notifications
You must be signed in to change notification settings - Fork 451
Refactor Optimizer for correctness #2052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Optimizer for correctness #2052
Conversation
@dsblank I think this is the correct fix but would appreciate your review |
Fantastic! I'll take a look. |
Here are the filters in XML: <?xml version="1.0" encoding="utf-8"?>
<filters>
<object type="Person">
<filter name="Ancestors of" function="and">
<rule class="IsAncestorOf" use_regex="False" use_case="False">
<arg value="I0044"/>
<arg value="1"/>
</rule>
</filter>
<filter name="Family and their Spouses" function="or">
<rule class="MatchesFilter" use_regex="False" use_case="False">
<arg value="Ancestors of"/>
</rule>
<rule class="MatchesFilter" use_regex="False" use_case="False">
<arg value="Siblings of Ancestors"/>
</rule>
</filter>
<filter name="Siblings of Ancestors" function="and">
<rule class="IsSiblingOfFilterMatch" use_regex="False" use_case="False">
<arg value="Ancestors of"/>
</rule>
</filter>
</object>
</filters> Planning on making this a test in gramps. |
We should include a unit test with a fix like this. |
@stevenyoungs I couldn't figure out how to make a PR on your PR. So here is a diff for adding a unittest for the above. It passes after your fix! diff --git a/gramps/gen/filters/test/__init__.py b/gramps/gen/filters/test/__init__.py
new file mode 100644
index 000000000..e69de29bb
diff --git a/gramps/gen/filters/test/filter_optimizer_test.py b/gramps/gen/filters/test/filter_optimizer_test.py
new file mode 100644
index 000000000..3fc0df3ca
--- /dev/null
+++ b/gramps/gen/filters/test/filter_optimizer_test.py
@@ -0,0 +1,76 @@
+from gramps.gen.filters import FilterList
+import tempfile
+import os
+import unittest
+
+from ...const import DATA_DIR
+from ...db.utils import import_as_dict
+from ...user import User
+from ...filters import reload_custom_filters
+
+reload_custom_filters()
+
+from ...filters import CustomFilters
+
+
+custom_filters_xml = """<?xml version="1.0" encoding="utf-8"?>
+<filters>
+ <object type="Person">
+ <filter name="Ancestors of" function="and">
+ <rule class="IsAncestorOf" use_regex="False" use_case="False">
+ <arg value="I0044"/>
+ <arg value="1"/>
+ </rule>
+ </filter>
+ <filter name="Family and their Spouses" function="or">
+ <rule class="MatchesFilter" use_regex="False" use_case="False">
+ <arg value="Ancestors of"/>
+ </rule>
+ <rule class="MatchesFilter" use_regex="False" use_case="False">
+ <arg value="Siblings of Ancestors"/>
+ </rule>
+ </filter>
+ <filter name="Siblings of Ancestors" function="and">
+ <rule class="IsSiblingOfFilterMatch" use_regex="False" use_case="False">
+ <arg value="Ancestors of"/>
+ </rule>
+ </filter>
+ </object>
+</filters>
+"""
+TEST_DIR = os.path.abspath(os.path.join(DATA_DIR, "tests"))
+EXAMPLE = os.path.join(TEST_DIR, "example.gramps")
+
+class OptimizerTest(unittest.TestCase):
+ @classmethod
+ def setUpClass(cls):
+ """
+ Import example database.
+ """
+ cls.db = import_as_dict(EXAMPLE, User())
+
+ with tempfile.NamedTemporaryFile(mode='w+', encoding='utf8') as tmp_file:
+ tmp_file.write(custom_filters_xml)
+ tmp_file.seek(0)
+ fl = FilterList(tmp_file.name)
+ fl.load()
+
+ the_custom_filters = CustomFilters.get_filters_dict("Person")
+ cls.filters = fl.get_filters_dict("Person")
+ for filter_name in cls.filters:
+ the_custom_filters[filter_name] = cls.filters[filter_name]
+
+ def test_ancestors_of(self):
+ filter = self.filters["Ancestors of"]
+ results = filter.apply(self.db)
+ self.assertEqual(len(results), 7)
+
+ def test_siblings_of_ancestors(self):
+ filter = self.filters["Siblings of Ancestors"]
+ results = filter.apply(self.db)
+ self.assertEqual(len(results), 18)
+
+ def test_family_and_their_spouses(self):
+ filter = self.filters["Family and their Spouses"]
+ results = filter.apply(self.db)
+ self.assertEqual(len(results), 25) Let me know if there is way to do this appropriately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice; thanks!
f85513f
to
5d7537f
Compare
hhmmmm I'm not sure why other tests are suddenly failing. I'm glad I made you co-author now @dsblank 🤣 |
I'll take a look :) |
It is probably because I altered the global CustomFilters. Let me see if I can cleanup after the new tests. |
Definitely caused by my test. Attempting to fix... |
Okay, I think I have made the pytest and Gramps gods happy. |
Please add entries in the |
Done. |
Sorry to intrude but as far as I can see filtering still does not work in Gramps 6 even with this optimizer fix. See for example the filters below. The filter F1 obviously should yield all individuals in the database (since it is an "or" filter with the "Everyone" rule).
Or have I misunderstood something? |
This looks like another test to add to the test suite. Thanks! |
Added @kkujansuu 's additional issue as another set of unittests. Now we have one failure to fix in the optimizer. |
Here's a tabular version of the nice diagram @kkujansuu created. It includes additional information on the rules which have a
The filter optimizer walks the tree of filters, looking for rules where
Where both conditions are true, the Earlier in this PR, I modified test 2 ( From the table above, we can see that the rule in filter F3 meets both conditions ( Alas, I'm still not sure what the correct fix is.... |
I think the issue is actually at a higher level: at the first level there is an "OR" (everything or the other branch). When there is such an "OR" we should not use the optimizer's |
@stevenyoungs, I made a change cc8bc77 that I believe fixes the issues, but not 100% sure yet. Need further testing. I think we can make up a tree of AND's and OR's that still fails. More investigation needed... |
I don't think the failing tests have anything to do with these changes. |
Like you, I'm not entirely confident, but it feels wrong. My assumption is that this works for @kkujansuu's example because the root filter is "or". If there was an F0 filter added on top, which used "and", my expectation is that the wrong result will be generated.
I believe so too. I think we need a multi-level filter, where the filter on each level has multiple rules. We then alternate these filters to use "and" or "or". Then repeat with each filter using "or" and repeat again with each filter using "and". Then for completeness, repeat with negation. @kkujansuu's example leads us in this direction. |
How about rewriting the optimizer completely and replacing it with these two functions that call each other recursively:
High level pseudocode (that does not handle "invert") could be something like:
GenericFilter would call compute_potential_handles_for_filter. ALL_HANDLES would be the set of all handles in the current namespace. Still todo: add "invert" handling and replace ALL_HANDLES with something more efficient. Could something like this work? |
Yes, I think so. The thing I missed in my first version was the ability to short-circuit any further processing when encountering an "or" at any level. |
cc8bc77
to
177ddb5
Compare
I've taken @kkujansuu's idea and implemented. Initial results look promising. It does not yet handle inverted filters, therefore tests are expected to fail. |
e5da87e
to
3ac68e7
Compare
Implemented support for inverted filters. I'm not sure that "one" is handled correctly. I think it should use |
Good, I think that makes more sense.
My previous comment overlapped your commit. I'll take a look at your most recent commits now. |
Ran a complex filter and the new changes worked. I ran the same filter again with the change from intersection to union and the exact same results were returned. |
Thanks @DaveSch-gramps! With the last fix, and more tests, I think we finally have all cases covered. |
Brave words given the history of this bug and PR - but I'm also feeling optimistic now. |
I do feel much more brave about this implementation :) The code is now easier to understand (and maintain) and we have many more tests. |
I've fixed the immediate problem by defining the filters "I0001" and "I0002". In my view the unit test should also have failed. |
Good catch. Looking into how the tests could have passed... |
I see... it returns all handles, which is correct. I'll add more tests where "all" doesn't necessarily give a good answer. |
@stevenyoungs if you test with either #2071 or #2062 you'll see that 12 tests fail. I like the idea, but we should probably wait for 6.1 to add features to this PR. |
018ffcb
to
26ad7a7
Compare
Good find. I've added a test that catches this scenario. The build has just failed, due to this test, so it is working. Edit to add, I used "parents of" as it was the simplest optimized rule I could find that is already in the gramps60 branch. |
26ad7a7
to
4325eaa
Compare
Yes. Please don't include new features into fixes in maintenance branches. |
I backed out the optimization of "or" and "one" in such a way that it is simple to enable in master, when the time is right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the filter optimizer to ensure that when combining the results of sub‑filters the logical operator of the parent filter is used instead of that of the right-hand sub‑filter. Key changes include the addition of new test files, modifications to filter rule tests (including tree support), and comprehensive updates to the optimizer and generic filter implementations to correct the operator usage.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
po/POTFILES.skip | Added entries to skip new test directories for generated tests. |
gramps/plugins/test/reports_test.py | Updated assertions to include error outputs for better debugging. |
gramps/gen/filters/rules/test/place_rules_test.py | Added tree parameter support to the filter test method. |
gramps/gen/filters/rules/person/_isdescendantfamilyof.py | Added conditional check prior to adding spouse_handle to results. |
gramps/gen/filters/optimizer.py | Refactored filter optimizer logic and operator handling. |
gramps/gen/filters/_genericfilter.py | Updated filter application logic and docstrings to reflect changes. |
gramps/gen/filters/_filterlist.py | Added loadString method for loading custom filters from strings. |
gramps/gen/filters/init.py | Added helper to set the global custom filters. |
Comments suppressed due to low confidence (1)
gramps/gen/filters/_genericfilter.py:175
- Since the Optimizer is now instantiated without a filter parameter and expects the filter to be passed explicitly in compute_potential_handles_for_filter, please update the documentation to clarify this change.
optimizer = Optimizer()
Refactored using simplified logic. Added unit tests. Fixes #13799. Co-authored-by: Doug Blank <[email protected]>
2911f8d
to
be30d5e
Compare
be30d5e
into
gramps-project:maintenance/gramps60
When the filter optimizer combines the result of two or more sub-filters, it is currently using the logical_op of the right hand sub-filter. Fix this so that the optimizer uses the parent filter logical_op
Fixes #13799 https://gramps-project.org/bugs/view.php?id=13799
Related forum discussion: https://gramps.discourse.group/t/strange-filter-behavior-in-6-0-1/7505/16
To reproduce using the example Gramps tree
set ID to I0044, Inclusive ticked
leave the Options set to "all rules must apply"
set the filter to "Ancestors of"
leave the Options set to "all rules must apply"
a. rule 1: People matching the and set the filter to "Ancestors of"
b. rule 2: People matching the and set the filter to "Siblings of Ancestors"
set the Options to "At least one rule must apply"
Filter the people view using the filter "Siblings of Ancestors". 25 people are expected. 0 are shown
The filter "Ancestors of" returns 7 people
The filter "Siblings of Ancestors" returns 18 people
So the union of these two results must, at a minimum, return 18 people. Therefore a result of 0 people cannot be correct.