Skip to content

🐞 Discrepancy in Inference Results Between Lightning, Torch, ONNX, and OpenVINO Models #2747

Open
@samet-akcay

Description

@samet-akcay

Describe the bug

📝 Description

There is a noticeable discrepancy in the inference results when running a trained anomalib model using different backends. For the same input image, the anomaly scores and anomaly maps produced by a model running within the PyTorch Lightning Engine are different from those produced by the same model after being exported to TorchScript, ONNX, and OpenVINO formats. This inconsistency undermines the reliability of the model deployment pipeline, as the behavior observed during evaluation (engine.predict) does not match the behavior of the exported artifacts.

Dataset

Other (please specify in the text field below)

Model

Other (please specify in the field below)

Steps to reproduce the behavior

🧐 Root Cause Analysis

The root cause of this discrepancy lies in the architectural separation between the two primary inference pathways within anomalib and how they were implemented in the base AnomalibModule.

  1. The Lightning Pathway (engine.predict): This pathway uses the PyTorch Lightning Trainer's prediction loop. The Trainer iterates over a DataLoader, which yields Batch objects (dictionaries containing the image tensor and other metadata). For each batch, the Trainer calls the model's predict_step method.

  2. The Exported Pathway (TorchInferencer, OpenVINOInferencer): When a model is exported to TorchScript, ONNX, or OpenVINO, the exporter traces the model's forward method. The forward method is the core computational graph that defines the direct transformation from an input tensor to an output prediction. The *Inferencer classes execute this traced forward graph.

The critical issue was in the implementation of the base AnomalibModule (src/anomalib/models/components/base/anomalib_module.py). The predict_step method was implemented to call the validation_step method, not the forward method.

Original predict_step:

def predict_step(self, batch: Batch, batch_idx: int, dataloader_idx: int = 0) -> STEP_OUTPUT:
    """Perform prediction step."""
    del dataloader_idx
    return self.validation_step(batch, batch_idx)

This design is problematic because a model's validation_step is not guaranteed to be identical to its forward method. The validation_step might include additional logic, handle the Batch dictionary differently, or perform calculations that are not part of the core inference graph defined in forward.

Since the exported models are a direct representation of the forward method, and the engine.predict calls were executing the validation_step logic, the two pathways were executing different code, leading to the observed differences in output.

OS information

OS information:

  • Python version: 3.11.11
  • Anomalib version: 2.1.0.dev
  • PyTorch version: 2.7

Expected behavior

✅ Proposed Solution

To ensure consistency across all inference methods, the predict_step in the base AnomalibModule must execute the same logic that is used for exporting. This is achieved by making predict_step call the forward method directly.

This change guarantees that engine.predict uses the exact same computational path as the one traced for TorchScript, ONNX, and OpenVINO exports.

Impact

This change aligns the behavior of the Lightning-based prediction with the behavior of exported models, ensuring that inference results are consistent, predictable, and reliable across all stages of development, evaluation, and deployment.

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

2.1.0.dev

Configuration YAML

N/A

Logs

N/A

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions