🐞 Discrepancy in Inference Results Between Lightning, Torch, ONNX, and OpenVINO Models

### Describe the bug

## 📝 Description

There is a noticeable discrepancy in the inference results when running a trained `anomalib` model using different backends. For the same input image, the anomaly scores and anomaly maps produced by a model running within the PyTorch Lightning `Engine` are different from those produced by the same model after being exported to TorchScript, ONNX, and OpenVINO formats. This inconsistency undermines the reliability of the model deployment pipeline, as the behavior observed during evaluation (`engine.predict`) does not match the behavior of the exported artifacts.

### Dataset

Other (please specify in the text field below)

### Model

Other (please specify in the field below)

### Steps to reproduce the behavior

## 🧐 Root Cause Analysis

The root cause of this discrepancy lies in the architectural separation between the two primary inference pathways within `anomalib` and how they were implemented in the base `AnomalibModule`.

1.  **The Lightning Pathway (`engine.predict`)**: This pathway uses the PyTorch Lightning `Trainer`'s prediction loop. The `Trainer` iterates over a `DataLoader`, which yields `Batch` objects (dictionaries containing the image tensor and other metadata). For each batch, the `Trainer` calls the model's `predict_step` method.

2.  **The Exported Pathway (`TorchInferencer`, `OpenVINOInferencer`)**: When a model is exported to TorchScript, ONNX, or OpenVINO, the exporter traces the model's `forward` method. The `forward` method is the core computational graph that defines the direct transformation from an input tensor to an output prediction. The `*Inferencer` classes execute this traced `forward` graph.

The critical issue was in the implementation of the base `AnomalibModule` (`src/anomalib/models/components/base/anomalib_module.py`). The `predict_step` method was implemented to call the `validation_step` method, not the `forward` method.

**Original `predict_step`:**
```python
def predict_step(self, batch: Batch, batch_idx: int, dataloader_idx: int = 0) -> STEP_OUTPUT:
    """Perform prediction step."""
    del dataloader_idx
    return self.validation_step(batch, batch_idx)
```

This design is problematic because a model's `validation_step` is not guaranteed to be identical to its `forward` method. The `validation_step` might include additional logic, handle the `Batch` dictionary differently, or perform calculations that are not part of the core inference graph defined in `forward`.

Since the exported models are a direct representation of the `forward` method, and the `engine.predict` calls were executing the `validation_step` logic, the two pathways were executing different code, leading to the observed differences in output.

### OS information

OS information:
- Python version: 3.11.11
- Anomalib version: 2.1.0.dev
- PyTorch version: 2.7


### Expected behavior

## ✅ Proposed Solution

To ensure consistency across all inference methods, the `predict_step` in the base `AnomalibModule` must execute the same logic that is used for exporting. This is achieved by making `predict_step` call the `forward` method directly.

This change guarantees that `engine.predict` uses the exact same computational path as the one traced for TorchScript, ONNX, and OpenVINO exports.

## Impact

This change aligns the behavior of the Lightning-based prediction with the behavior of exported models, ensuring that inference results are consistent, predictable, and reliable across all stages of development, evaluation, and deployment.

### Screenshots

_No response_

### Pip/GitHub

pip

### What version/branch did you use?

2.1.0.dev

### Configuration YAML

```yaml
N/A
```

### Logs

```shell
N/A
```

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐞 Discrepancy in Inference Results Between Lightning, Torch, ONNX, and OpenVINO Models #2747

Describe the bug

📝 Description

Dataset

Model

Steps to reproduce the behavior

🧐 Root Cause Analysis

OS information

Expected behavior

✅ Proposed Solution

Impact

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐞 Discrepancy in Inference Results Between Lightning, Torch, ONNX, and OpenVINO Models #2747

Description

Describe the bug

📝 Description

Dataset

Model

Steps to reproduce the behavior

🧐 Root Cause Analysis

OS information

Expected behavior

✅ Proposed Solution

Impact

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions