-
Notifications
You must be signed in to change notification settings - Fork 36
Change split point calculation in KD-tree construction #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This tries to mimic the splitting of the original Loess implementation which is used by R. The implementation is based on reverse enginerring of the behavior as the rules are only loosely described in the original paper. With the rules described in the comment we are able to match the splits of R. When adding tests, I realized that the weight calculation in the local regression were off by a square root. They were computed as the diagonal Elements of W in `inv(X'*W*X)*X'*W*y` but we applied them to X and y before computing the OLS estimates so the weights were squared. The signatures have also been loosened to allow more element types. This made it easier to test with the cars dataset from R. I've added a lot of `@debug` statements to made it easier to follow the KD-tree construction.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #64 +/- ##
==========================================
- Coverage 92.59% 92.11% -0.48%
==========================================
Files 2 2
Lines 189 203 +14
==========================================
+ Hits 175 187 +12
- Misses 14 16 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
I forgot to mention that, with these changes, I was able to change one of the broken tests to a working test. The other broken test was changed to a |
7a824f9
to
7363754
Compare
Thanks for the comments. I believe that I've now addressed all of them so please have another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 🙂
This tries to mimic the splitting of the original Loess implementation which is used by R. The implementation is based on reverse enginerring of the behavior as the rules are only loosely described in the original paper. With the rules described in the comment we are able to match the splits of R.
When adding tests, I realized that the weight calculation in the local regression were off by a square root. They were computed as the diagonal elements of W in
inv(X'*W*X)*X'*W*y
but we applied them to X and y before computing the OLS estimates so the weights were squared.The signatures have also been loosened to allow more element types. This made it easier to test with the cars dataset from R.
I've added a lot of
@debug
statements to made it easier to follow the KD-tree construction.Update: The changes to the signatures fixes #48