Skip to content

Custom Anomaly Detection Model with ML Commons – Real-time Inference in Ingest Pipelines? #3973

Open
@Yash-Patil-2004

Description

@Yash-Patil-2004

Summary

I am developing multiple machine learning models as a Flask-based API that detect various log anomalies, such as:

  • Unusual status codes,
  • Spikes in error rates,
  • Error categorization related to databases.

The Flask app exposes a /predict endpoint, which accepts log data and returns predictions. I want to integrate this setup into OpenSearch to enable real-time anomaly detection.


Intended Architecture

I aim to:

  1. Create an OpenSearch ingest pipeline that:

    • Sends incoming log data to the /predict endpoint of my external model.
    • Receives predictions (anomalies), and
    • Routes anomaly logs to a separate index for visualization and dashboarding.
  2. Reduce infrastructure costs by enabling real-time anomaly detection during ingestion, rather than batch processing.


Current Issue

While exploring ML Commons and the ml_inference ingest processor, I noticed:

  • It supports specific types of models: text embedding models, sparse encoding, cross-encoders, and question-answering.
  • It appears to only work with registered ML Commons models, not arbitrary external APIs like my Flask /predict.
  • There is no clear support for invoking a custom anomaly detection model hosted externally in real-time from within an ingest pipeline.
  • Even if I allow private IP's into the cluster setting there are errors in input/output mapping int the ingestion pipeline and also with accessing that endpoint.
Image

Questions

  1. Is it currently possible to:

    • Register and invoke a custom anomaly detection model via ML Commons (hosted externally),
    • And use it within an ingest pipeline to enrich incoming documents in real time?
  2. Which architecture is more suitable and cost-effective for this use case:

    • Option A: Deploying the model on OpenSearch and using ML Commons / ingest pipeline,
    • Option B: Hosting the model externally and using a scheduled job to:
      • Fetch logs from the past 5 minutes,
      • Run inference,
      • Index anomalies into a different index.

My Goal

To build a robust, low-latency anomaly detection pipeline integrated with OpenSearch for logs analysis and dashboarding—while keeping infra cost and maintenance complexity low.

Any guidance on the supported approach or roadmap plans for ML Commons and ingest pipelines would be highly appreciated.

Thanks in advance!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions