i.cca: fix indexing, buffer size issues and matrix computation #5948

jayneel-shah18 · 2025-06-23T12:46:04Z

This PR resolves a segmentation fault and memory misalignment in the i.cca module. (#5947)

Changes Made:

Corrected all loops and array accesses to use zero-based indexing.
Replaced incorrect memory allocations sized by nclass with allocations correctly sized by bands.
Fixed off-by-one output raster naming to preserve the original <prefix>.1, <prefix>.2, etc.
Ensured symmetric covariance matrices by correctly including diagonal elements in the computation.

Testing:

The changes were tested using the same reproduction steps outlined in the linked issue. The module now runs successfully and produces canonical component outputs without error. This fix brings i.cca in line with standard C indexing practices, preventing both runtime crashes and silent memory corruption.

Fix Matrix Calculation and Numeric Stability

Summary of new changes

Correct computation of between-class scatter matrix B
- The original implementation subtracted μ̄ * μ̄ᵀ in each loop iteration, which caused B to be mis-scaled.
- This is replaced with the canonical formula, which matches theoretical expectations (validated with synthetic data).

B = (1 / (nclass - 1)) * [ ∑ nᵢ * μᵢ * μᵢᵀ - N * μ̄ * μ̄ᵀ ]

Normalize canonical vectors
- Added normalization step so each row of the transformation matrix q has unit norm.
- This improves interpretability and stability.
Switch outputs to DCELL
- Output rasters now use DCELL precision instead of CELL.
- Prevents rounding errors and loss of detail in canonical projections.

I will be adding the regression testsuite for this module in a separate PR once the changes are reviewed.

petrasovaa · 2025-06-23T13:35:46Z

Have you seen #3239?

jayneel-shah18 · 2025-06-24T10:48:10Z

I reviewed #3239 in depth. The patch I have proposed in this PR addresses the same root issues: segmentation faults and incorrect array indexing in i.cca.

As @marisn rightly pointed out, while the module now executes without crashing, correctness of the output remains uncertain. To investigate this, I constructed a synthetic test case based on the LASDOC methodology as suggested there to verify the mathematical part of the canonical transformation.

Test Setup

I generated synthetic rasters with two classes, each having 2D Gaussian distributions with distinct means and covariance matrices:

mean1 = [2, 3],    cov1 = [[1,  0.5],  [0.5,  1]]
mean2 = [4, 1],    cov2 = [[1, -0.5],  [-0.5, 1]]

The signature file generated by i.gensig closely matched these intended statistics:

band1  band2
Class 1: mean = [2.08307, 3.11709], cov = [[0.823, 0.300], [0.300, 0.775]]
Class 2: mean = [3.91068, 1.13281], cov = [[1.099, -0.592], [-0.592, 0.959]]

These values are acceptably close for testing purposes.

Expected vs Actual CCA Output

Using LASDOC's canonical projection direction, the expected separation along the first canonical axis should have absolute difference between the means nearly equal to 2.8.

However, the actual results from the patched i.cca module were:

Mean of cca.1 in class 1 (masked): 3.73
Mean of cca.1 in class 2 (masked): 3.19
Difference = 0.54, significantly less than expected

These results suggest that while the implementation runs, the transformation may not be projecting the data along the true canonical direction. The cause could lie in incorrect eigenvector handling, normalization, or covariance preprocessing.

I will now dive into the matrix computation steps other than main.c to find where the numerical behavior diverges from LASDOC expectations.

echoix · 2025-06-24T11:50:38Z

How big of an error/deviation would an off by one issue cause in the same synthetic data, before concluding if it is a close enough result?

jayneel-shah18 · 2025-06-24T13:43:00Z

Yes, the off-by-one bug was critical, but even after fixing it, the numerical deviation suggests that other computational inaccuracies remain. I tried fixing a precision issues in transform.c, after correcting the data type from CELL to FCELL during the transformation step, the delta improved slightly to ~0.57.

marisn

You should do a calculation on paper based on the original description and then create a test case based on this manually validated data. Take look at some other modules to see examples (e.g. r.kappa, r.mapcalc) – the dataset does not have to be large one (e.g. 5x5 raster should be fine) to make it easy to calculate by hand. This should be enough to validate that the algorithm is implemented correctly. Then the next step can be tracking down numerical stability issues.

A test case to merge this PR is a must!

jayneel-shah18 · 2025-06-26T16:10:47Z

I tried the numerical example and while debugging I found out that in the original between() function, the computation of the between-class scatter matrix B was implemented as:

// Compute weighted overall mean
for (i = 0; i < nclass; i++)
    for (j = 0; j < bands; j++)
        newvec[j] += nsamp[i] * mu[i][j];

// Compute outer product of overall mean
for (i = 0; i < bands; i++)
    for (j = 0; j < bands; j++)
        tmp1[i][j] = (newvec[i] * newvec[j]) / samptot;

// Compute sum of (n_i * mu_i * mu_i^T) - overall_mean_outer
for (k = 0; k < nclass; k++) {
    product(mu[k], nsamp[k], tmp0, bands);
    for (i = 0; i < bands; i++)
        for (j = 0; j < bands; j++)
            tmp2[i][j] += tmp0[i][j] - tmp1[i][j];  
}

The subtraction of tmp1 (i.e., the outer product of the global mean μ̄·μ̄ᵀ) was performed within the loop over each class. This results in subtracting the same μ̄·μ̄ᵀ term multiple times , once per class , which violates the formula:

Correct formula:
$B = \frac{1}{n_{\text{classes}} - 1} \left[ \sum_{i} n_i \mu_i \mu_i^T - N \bar{\mu} \bar{\mu}^T \right]$

I tried to fix the logic which now separates the classwise accumulation and the global mean subtraction:

// Step 1: Compute weighted global mean
for (j = 0; j < bands; j++) {
    newvec[j] = 0.0;
    for (i = 0; i < nclass; i++)
        newvec[j] += nsamp[i] * mu[i][j];
    newvec[j] /= samptot;
}

// Step 2: Compute N · (μ̄ · μ̄ᵀ)
for (i = 0; i < bands; i++)
    for (j = 0; j < bands; j++)
        tmp1[i][j] = samptot * newvec[i] * newvec[j]; 

// Step 3: Accumulate ∑ n_i · μ_i · μ_iᵀ
for (k = 0; k < nclass; k++) {
    product(mu[k], nsamp[k], tmp0, bands);
    for (i = 0; i < bands; i++)
        for (j = 0; j < bands; j++)
            tmp2[i][j] += tmp0[i][j]; 
}

// Step 4: Subtract global mean outer product once
for (i = 0; i < bands; i++)
    for (j = 0; j < bands; j++)
        tmp2[i][j] -= tmp1[i][j];

// Step 5: Normalize
for (i = 0; i < bands; i++)
    for (j = 0; j < bands; j++)
        p[i][j] = tmp2[i][j] / (nclass - 1);

With this fix:

The B matrix now matches the canonical result:
```
B = [[ 50, -50 ],
     [-50,  50 ]]
```
Canonical vector q[0] = [0.7071, -0.7071]
Projection difference = 2.8284 as expected

I will add this change to the PR once it is verified. Thank you!

petrasovaa · 2025-06-27T17:28:01Z

This looks good to me, but if somebody else more proficient in linear algebra wants to review this, that would be better.

jayneel-shah18 mentioned this pull request Jun 23, 2025

[Bug] i.cca causes segmentation fault due to incorrect indexing #5947

Open

github-actions bot added C Related code is in C module imagery labels Jun 23, 2025

petrasovaa requested a review from marisn June 24, 2025 13:50

marisn requested changes Jun 25, 2025

View reviewed changes

jayneel-shah18 added 3 commits June 27, 2025 10:30

fix indexing and buffer size issues

ab8ac0c

fix missed index

761a9da

fix B matrix computation and add normalize vectors

689999f

jayneel-shah18 force-pushed the i_cca_bug_fix branch from 902e95f to 689999f Compare June 27, 2025 10:30

jayneel-shah18 changed the title ~~i.cca: fix indexing and buffer size issues~~ i.cca: fix indexing, buffer size issues and matrix computation Jun 27, 2025

petrasovaa requested a review from marisn June 27, 2025 13:46

petrasovaa approved these changes Jul 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

i.cca: fix indexing, buffer size issues and matrix computation #5948

i.cca: fix indexing, buffer size issues and matrix computation #5948

jayneel-shah18 commented Jun 23, 2025 •

edited

Loading

Uh oh!

petrasovaa commented Jun 23, 2025

Uh oh!

jayneel-shah18 commented Jun 24, 2025 •

edited

Loading

Uh oh!

echoix commented Jun 24, 2025

Uh oh!

jayneel-shah18 commented Jun 24, 2025

Uh oh!

marisn left a comment

Uh oh!

jayneel-shah18 commented Jun 26, 2025

Uh oh!

petrasovaa commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

i.cca: fix indexing, buffer size issues and matrix computation #5948

Are you sure you want to change the base?

i.cca: fix indexing, buffer size issues and matrix computation #5948

Conversation

jayneel-shah18 commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made:

Testing:

Fix Matrix Calculation and Numeric Stability

Summary of new changes

Uh oh!

petrasovaa commented Jun 23, 2025

Uh oh!

jayneel-shah18 commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Setup

Expected vs Actual CCA Output

Uh oh!

echoix commented Jun 24, 2025

Uh oh!

jayneel-shah18 commented Jun 24, 2025

Uh oh!

marisn left a comment

Choose a reason for hiding this comment

Uh oh!

jayneel-shah18 commented Jun 26, 2025

Uh oh!

petrasovaa commented Jun 27, 2025

Uh oh!

Uh oh!

jayneel-shah18 commented Jun 23, 2025 •

edited

Loading

jayneel-shah18 commented Jun 24, 2025 •

edited

Loading