-
Notifications
You must be signed in to change notification settings - Fork 43
Circular median: might return wrong results #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the feedback. I suggest we fix the behavior by the following update:
I think that should be reasonable and predictable behavior. Do you agree with that @jxrossel ? |
You're welcome. I could not find a case that wouldn't work with your suggestions. However, I wonder if this approach would be useful, because the number of NaN-returning cases could be important, particularly for ill-behaved data. While trying to identify challenging test cases, I found another bug in the implementation of the angular difference calculation, as done in the version released on pypi. The left-to-right data point counter, based on the sign of the distance could be screwed. I see now that this has been fixed in the current dev version. Fixing that, I could decrease the number of cases to handle. I leave my version below for anyone interested. It may handle some cases which could never occur theoretically (I haven't found examples for all cases) and contains a few debugging lines. However, I found some multiple-solution cases for which a choice of solution couldn't be based on generic arguments (like "closest to the mean"). I now think that this approach is intrinsically doomed to fail. I found a discussion here (scipy/scipy#6644 (comment)) pointing toward a more reliable approach based on arc-distance median (minimizing angular distance). In case of interest:
|
Thanks for the update. Could you fork the dev version and make a pull request with your updated function? |
The current implementation of the circular median only returns correct results for well-behaved data. In particular, for an even number of data points, the line 141
will take both points having the smallest left-to-right unbalance without checking that (1) these points are direct neighbours, (2) their unbalances cancel each other, (3) the unbalance is 1 for each, (4) their circular mean is really a median (depending on what is present on the opposite side of the circle, there could be a whole zoology of behaviours between two balancing points).
In addition, the condition
doesn't make sense. You can have cases with
m>1
without ties, e.g. [ 0, 1, 1, 2, 2, 3 ], and cases withm==1
and ties, e.g. [ 1, 1.2, 2, 2.2, 1+pi, 1.2+pi, 2+pi, 2.2+pi ].I have tried to rewrite this function in a simple 1D-arrays case. It works for most cases and returns NaN when facing an unexpected case. The multiple case handling makes it messy though...
The text was updated successfully, but these errors were encountered: