-
-
Notifications
You must be signed in to change notification settings - Fork 361
i.cca: fix indexing, buffer size issues and matrix computation #5948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Have you seen #3239? |
I reviewed #3239 in depth. The patch I have proposed in this PR addresses the same root issues: segmentation faults and incorrect array indexing in As @marisn rightly pointed out, while the module now executes without crashing, correctness of the output remains uncertain. To investigate this, I constructed a synthetic test case based on the LASDOC methodology as suggested there to verify the mathematical part of the canonical transformation. Test SetupI generated synthetic rasters with two classes, each having 2D Gaussian distributions with distinct means and covariance matrices:
The signature file generated by
These values are acceptably close for testing purposes. Expected vs Actual CCA OutputUsing LASDOC's canonical projection direction, the expected separation along the first canonical axis should have absolute difference between the means nearly equal to 2.8. However, the actual results from the patched
These results suggest that while the implementation runs, the transformation may not be projecting the data along the true canonical direction. The cause could lie in incorrect eigenvector handling, normalization, or covariance preprocessing. I will now dive into the matrix computation steps other than |
How big of an error/deviation would an off by one issue cause in the same synthetic data, before concluding if it is a close enough result? |
Yes, the off-by-one bug was critical, but even after fixing it, the numerical deviation suggests that other computational inaccuracies remain. I tried fixing a precision issues in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should do a calculation on paper based on the original description and then create a test case based on this manually validated data. Take look at some other modules to see examples (e.g. r.kappa, r.mapcalc) – the dataset does not have to be large one (e.g. 5x5 raster should be fine) to make it easy to calculate by hand. This should be enough to validate that the algorithm is implemented correctly. Then the next step can be tracking down numerical stability issues.
A test case to merge this PR is a must!
I tried the numerical example and while debugging I found out that in the original
The subtraction of
I tried to fix the logic which now separates the classwise accumulation and the global mean subtraction:
With this fix:
I will add this change to the PR once it is verified. Thank you! |
902e95f
to
689999f
Compare
This looks good to me, but if somebody else more proficient in linear algebra wants to review this, that would be better. |
This PR resolves a segmentation fault and memory misalignment in the
i.cca
module. (#5947)Changes Made:
nclass
with allocations correctly sized bybands
.<prefix>.1
,<prefix>.2
, etc.Testing:
The changes were tested using the same reproduction steps outlined in the linked issue. The module now runs successfully and produces canonical component outputs without error. This fix brings
i.cca
in line with standard C indexing practices, preventing both runtime crashes and silent memory corruption.Fix Matrix Calculation and Numeric Stability
Summary of new changes
μ̄ * μ̄ᵀ
in each loop iteration, which caused B to be mis-scaled.Normalize canonical vectors
Switch outputs to DCELL
I will be adding the regression testsuite for this module in a separate PR once the changes are reviewed.