Description
Describe the bug
📝 Description
There is a noticeable discrepancy in the inference results when running a trained anomalib
model using different backends. For the same input image, the anomaly scores and anomaly maps produced by a model running within the PyTorch Lightning Engine
are different from those produced by the same model after being exported to TorchScript, ONNX, and OpenVINO formats. This inconsistency undermines the reliability of the model deployment pipeline, as the behavior observed during evaluation (engine.predict
) does not match the behavior of the exported artifacts.
Dataset
Other (please specify in the text field below)
Model
Other (please specify in the field below)
Steps to reproduce the behavior
🧐 Root Cause Analysis
The root cause of this discrepancy lies in the architectural separation between the two primary inference pathways within anomalib
and how they were implemented in the base AnomalibModule
.
-
The Lightning Pathway (
engine.predict
): This pathway uses the PyTorch LightningTrainer
's prediction loop. TheTrainer
iterates over aDataLoader
, which yieldsBatch
objects (dictionaries containing the image tensor and other metadata). For each batch, theTrainer
calls the model'spredict_step
method. -
The Exported Pathway (
TorchInferencer
,OpenVINOInferencer
): When a model is exported to TorchScript, ONNX, or OpenVINO, the exporter traces the model'sforward
method. Theforward
method is the core computational graph that defines the direct transformation from an input tensor to an output prediction. The*Inferencer
classes execute this tracedforward
graph.
The critical issue was in the implementation of the base AnomalibModule
(src/anomalib/models/components/base/anomalib_module.py
). The predict_step
method was implemented to call the validation_step
method, not the forward
method.
Original predict_step
:
def predict_step(self, batch: Batch, batch_idx: int, dataloader_idx: int = 0) -> STEP_OUTPUT:
"""Perform prediction step."""
del dataloader_idx
return self.validation_step(batch, batch_idx)
This design is problematic because a model's validation_step
is not guaranteed to be identical to its forward
method. The validation_step
might include additional logic, handle the Batch
dictionary differently, or perform calculations that are not part of the core inference graph defined in forward
.
Since the exported models are a direct representation of the forward
method, and the engine.predict
calls were executing the validation_step
logic, the two pathways were executing different code, leading to the observed differences in output.
OS information
OS information:
- Python version: 3.11.11
- Anomalib version: 2.1.0.dev
- PyTorch version: 2.7
Expected behavior
✅ Proposed Solution
To ensure consistency across all inference methods, the predict_step
in the base AnomalibModule
must execute the same logic that is used for exporting. This is achieved by making predict_step
call the forward
method directly.
This change guarantees that engine.predict
uses the exact same computational path as the one traced for TorchScript, ONNX, and OpenVINO exports.
Impact
This change aligns the behavior of the Lightning-based prediction with the behavior of exported models, ensuring that inference results are consistent, predictable, and reliable across all stages of development, evaluation, and deployment.
Screenshots
No response
Pip/GitHub
pip
What version/branch did you use?
2.1.0.dev
Configuration YAML
N/A
Logs
N/A
Code of Conduct
- I agree to follow this project's Code of Conduct