Skip to content

Change split point calculation in KD-tree construction #64

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 11, 2023

Conversation

andreasnoack
Copy link
Member

@andreasnoack andreasnoack commented Mar 29, 2023

This tries to mimic the splitting of the original Loess implementation which is used by R. The implementation is based on reverse enginerring of the behavior as the rules are only loosely described in the original paper. With the rules described in the comment we are able to match the splits of R.

When adding tests, I realized that the weight calculation in the local regression were off by a square root. They were computed as the diagonal elements of W in inv(X'*W*X)*X'*W*y but we applied them to X and y before computing the OLS estimates so the weights were squared.

The signatures have also been loosened to allow more element types. This made it easier to test with the cars dataset from R.

I've added a lot of @debug statements to made it easier to follow the KD-tree construction.

Update: The changes to the signatures fixes #48

This tries to mimic the splitting of the original Loess implementation
which is used by R. The implementation is based on reverse enginerring
of the behavior as the rules are only loosely described in the original
paper. With the rules described in the comment we are able to match the
splits of R.

When adding tests, I realized that the weight calculation in the local
regression were off by a square root. They were computed as the diagonal
Elements of W in `inv(X'*W*X)*X'*W*y` but we applied them to X and y before
computing the OLS estimates so the weights were squared.

The signatures have also been loosened to allow more element types. This
made it easier to test with the cars dataset from R.

I've added a lot of `@debug` statements to made it easier to follow the
KD-tree construction.
@codecov-commenter
Copy link

codecov-commenter commented Mar 29, 2023

Codecov Report

Patch coverage: 94.28% and project coverage change: -0.48 ⚠️

Comparison is base (31d924a) 92.59% compared to head (7363754) 92.11%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #64      +/-   ##
==========================================
- Coverage   92.59%   92.11%   -0.48%     
==========================================
  Files           2        2              
  Lines         189      203      +14     
==========================================
+ Hits          175      187      +12     
- Misses         14       16       +2     
Impacted Files Coverage Δ
src/Loess.jl 86.86% <83.33%> (-0.51%) ⬇️
src/kd.jl 97.11% <96.55%> (-1.73%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@andreasnoack
Copy link
Member Author

I forgot to mention that, with these changes, I was able to change one of the broken tests to a working test. The other broken test was changed to a @test_throws as the span is now too small. The reason for this is that I'm no longer using ceil when calculating the minimum bucket size.

@andreasnoack
Copy link
Member Author

Thanks for the comments. I believe that I've now addressed all of them so please have another look.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 🙂

@andreasnoack andreasnoack merged commit 60a5998 into master Apr 11, 2023
@andreasnoack andreasnoack deleted the an/newmedian branch April 11, 2023 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only accepts Float64
3 participants