BIONF
diff --git a/‎.github/workflows/test.yml
+2-2 b/‎.github/workflows/test.yml
+2-2
diff --git a/‎README.md
+7-7 b/‎README.md
+7-7
diff --git a/‎docs/cli.md
+25-59 b/‎docs/cli.md
+25-59
diff --git a/‎docs/input_data.md
+1-3 b/‎docs/input_data.md
+1-3
diff --git a/‎pyproject.toml
+1-1 b/‎pyproject.toml
+1-1
diff --git a/‎src/xspect/definitions.py
+7 b/‎src/xspect/definitions.py
+7
diff --git a/‎src/xspect/download_filters.py renamed to ‎src/xspect/download_models.py
+2-2 b/‎src/xspect/download_filters.py renamed to ‎src/xspect/download_models.py
+2-2
diff --git a/‎src/xspect/fastapi.py
+2-2 b/‎src/xspect/fastapi.py
+2-2
diff --git a/‎src/xspect/main.py
+61-8 b/‎src/xspect/main.py
+61-8
diff --git a/‎src/xspect/mlst_feature/__init__.py b/‎src/xspect/mlst_feature/__init__.py
@@ -13,9 +13,9 @@ jobs:
           run: |
             python -m pip install --upgrade pip
             pip install '.[test]'
-        - name: Download filters
+        - name: Download models
           run: |
-            xspect download-filters
+            xspect download-models
         - name: Test with pytest
           run: |
             pytest --cov
 
@@ -6,7 +6,7 @@
 <img src="/docs/img/logo.png" height="50%" width="50%">
 
 <!-- start intro -->
-XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters] and a [Support Vector Machine]. It also identifies existing [blaOxa-genes] and provides a list of relevant research papers for further information.
+XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or MLST level using [Bloom Filters] and a [Support Vector Machine].
 <br/><br/>
 
 XspecT utilizes the uniqueness of kmers and compares extracted kmers from the input-data to a reference database. Bloom Filter ensure a fast lookup in this process. For a final prediction the results are classified using a Support Vector Machine. 
@@ -31,14 +31,14 @@ pip install xspect
 Please note that Windows and Alpine Linux is currently not supported.
 
 ## Usage
-### Get the Bloomfilters
-To download basic pre-trained filters, you can use the built-in command:
+### Get the models
+To download basic pre-trained models, you can use the built-in command:
 ```
-xspect download-filters
+xspect download-models
 ```
-Additional species filters can be trained using:
+Additional species models can be trained using:
 ```
-xspect train you-ncbi-genus-name
+xspect train-species you-ncbi-genus-name
 ```
 
 ### How to run the web app
@@ -50,7 +50,7 @@ xspect api
 ### How to use the XspecT command line interface
 Run xspect with the configuration you want to run it with as arguments.
 ```
-xspect classify your-genus path/to/your/input-set
+xspect classify-species your-genus path/to/your/input-set
 ```
 For further instructions on how to use the command line interface, please refer to the [documentation] or execute:
 ```
 
@@ -1,114 +1,80 @@
 # How to use the CLI
 
-XspecT comes with a built-in command line interface (CLI), which enables quick classifications without the need to use the web interface. The command line interface can also be used to download and train filters.
+XspecT comes with a built-in command line interface (CLI), which enables quick classifications without the need to use the web interface. The command line interface can also be used to download and train models.
 
 After installing XspecT, a list of available commands can be viewed by running:
 
 ```bash
 xspect --help
 ```
 
-## Filter downloads
+## Model downloads
 
-A basic set of pre-trained filters (Acinetobacter and Salonella) can be downloaded using the following command:
+A basic set of pre-trained models (Acinetobacter and Salonella) can be downloaded using the following command:
 
 ```bash
-xspect download-filters
+xspect download-models
 ```
 
-For the moment, it is not possible to specify exactly which filters should be downloaded.
+For the moment, it is not possible to specify exactly which models should be downloaded.
 
 ## Classification
 
 To classify samples, the command
 
 ```bash
-xspect classify GENUS PATH
+xspect classify-species GENUS PATH
 ```
 
 can be used, when `GENUS` refers to the NCBI genus name of your sample and `PATH` refers to the path to your sample *directory*. This command will classify the species of your sample within the given genus.
 
 The following options are available:
 
 ```bash
--s, --species / --no-species    Species classification.
--i, --ic / --no-ic              IC strain typing.
--o, --oxa / --no-oxa            OXA gene family detection.
--m, --meta / --no-meta          Metagenome classification.
--c, --complete                  Use every single k-mer as input for
-                                  classification.
--s, --save                      Save results to csv file.
---help                          Show this message and exit.
+-m, --meta / --no-meta  Metagenome classification.
+-s, --step INTEGER      Sparse sampling step size (e. g. only every 500th
+                        kmer for step=500).
+--help                  Show this message and exit.
 ```
 
-### Species Classification
+To speed up the analysis, only every nth kmer can be considered ("sparse sampling"). For example, to only consider every 10th kmer, run:
 
-Species classification is run by default, without the need for further parameters:
 ```bash
-xspect classify Acinetobacter path
-```
-
-Species classification can be toggled using the `-s`/`--species` (`--no-species`) option. To run classification without species classification, the option `--no-species` can be used, for example when running a different analysis:
-
-```bash
-xspect classify --no-species -i Acinetobacter path
-```
-
-### IC Strain Typing
-
-To perform International Clonal (IC) type classification, the `-i`/`--ic` (`--no-ic`) option can be used:
-
-```bash
-xspect classify -i Acinetobacter path
-```
-
-Please note that IC strain typing is only available for Acinetobacter baumanii.
-
-### OXA Gene Detection
-
-OXA gene detection can be enabled using the `-o`/`--oxa` (`--no-oxa`) option.
-
-```bash
-xspect classify -o Acinetobacter path
+xspect classify-species -s 10 Acinetobacter path
 ```
 
 ### Metagenome Mode
 
 To analyze a sample in metagenome mode, the `-m`/`--meta` (`--no-meta`) option can be used:
 
 ```bash
-xspect classify -m Acinetobacter path
+xspect classify-species -m Acinetobacter path
 ```
 
-Compared to normal XspecT modes, this mode first identifies reads belonging to the given genus and continues classification only with the resulting reads and is thus more suitable for metagenomic samples as the resulting runtime is decreased.
+Compared to normal XspecT species classification, this mode first identifies reads belonging to the given genus and continues classification only with the resulting reads, It is thus more suitable for metagenomic samples as the resulting runtime is decreased.
 
-## Filter Training
+### MLST Classification
 
-<aside>
-⚠️ Depending on genome size and the amount of species, training can take time!
+Samples can also be classified based on Multi-locus sequence type schemas. To MLST-classify a sample, run:
 
-</aside>
-
-In order to train filters, please first ensure [Jellyfish](https://github.com/gmarcais/Jellyfish) is installed.
+```bash
+xspect classify-mlst -p path
+```
 
-### NCBI-based filter training
+## Model Training
 
-The easiest way to train new filters is to use data from NCBI, which is automatically downloaded and processed by XspecT.
+Models can be trained based on data from NCBI, which is automatically downloaded and processed by XspecT.
 
-To train a filter with data from NCBI, run the following command:
+To train a model, run the following command:
 
 ```bash
-xspect train your-ncbi-genus
+xspect train-species your-ncbi-genus
 ```
 
 `you-ncbi-genus` can be a genus name from NCBI or an NCBI taxonomy ID.
 
-### Custom data filter training
-
-XspecT filters can also be trained using custom data, which need to be provided as a folder for both filter and SVM training. The provided assembly files need to be in FASTA format and their names should be the species ID and the species name, for example `28901_enterica.fasta`. While the ID can be arbitrary, the standard is NCBI taxon IDs.
-
-The filters can then be trained using:
+To train models for MLST classifications, run:
 
 ```bash
-xspect train -bf-path directory/1 -svm-path directory/2
+xspect train-mlst
 ```
@@ -1,4 +1,2 @@
 # Input Data
-XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).
-
-The amount of reads that will be used has to be set by the user when using sequence-reads. The minimum amount is 5000 reads for species classification and 500 reads for sub-type classification. The maximum number of reads is limited by the browser and is usually around ~8 million reads. Using more reads will lead to a increased runtime (xsec./1mio reads).
+XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).
@@ -1,6 +1,6 @@
 [project]
 name = "XspecT"
-version = "0.2.5"
+version = "0.2.6"
 description = "Tool to monitor and characterize pathogens using Bloom filters."
 readme = {file = "README.md", content-type = "text/markdown"}
 license = {file = "LICENSE"}
 
@@ -40,3 +40,10 @@ def get_xspect_runs_path():
     runs_path = get_xspect_root_path() / "runs"
     runs_path.mkdir(exist_ok=True, parents=True)
     return runs_path
+
+
+def get_xspect_mlst_path():
+    """Return the path to the XspecT runs directory."""
+    mlst_path = get_xspect_root_path() / "mlst"
+    mlst_path.mkdir(exist_ok=True, parents=True)
+    return mlst_path
@@ -7,8 +7,8 @@
 from xspect.definitions import get_xspect_model_path, get_xspect_tmp_path
 
 
-def download_test_filters(url):
-    """Download filters."""
+def download_test_models(url):
+    """Download models."""
 
     download_path = get_xspect_tmp_path() / "models.zip"
     extract_path = get_xspect_tmp_path() / "extracted_models"
 
@@ -5,7 +5,7 @@
 from shutil import copyfileobj
 from fastapi import FastAPI, UploadFile, BackgroundTasks
 from xspect.definitions import get_xspect_runs_path, get_xspect_upload_path
-from xspect.download_filters import download_test_filters
+from xspect.download_models import download_test_models
 import xspect.model_management as mm
 from xspect.models.result import StepType
 from xspect.pipeline import ModelExecution, Pipeline, PipelineStep
@@ -17,7 +17,7 @@
 @app.get("/download-filters")
 def download_filters():
     """Download filters."""
-    download_test_filters("https://xspect2.s3.eu-central-1.amazonaws.com/models.zip")
+    download_test_models("https://xspect2.s3.eu-central-1.amazonaws.com/models.zip")
 
 
 @app.get("/classify")
 
@@ -6,13 +6,23 @@
 import click
 import uvicorn
 from xspect import fastapi
-from xspect.download_filters import download_test_filters
+from xspect.download_models import download_test_models
 from xspect.train import train_ncbi
 from xspect.models.result import (
     StepType,
 )
-from xspect.definitions import get_xspect_runs_path, fasta_endings, fastq_endings
+from xspect.definitions import (
+    get_xspect_runs_path,
+    fasta_endings,
+    fastq_endings,
+    get_xspect_model_path,
+)
 from xspect.pipeline import ModelExecution, Pipeline, PipelineStep
+from xspect.mlst_feature.mlst_helper import pick_scheme, pick_scheme_from_models_dir
+from xspect.mlst_feature.pub_mlst_handler import PubMLSTHandler
+from xspect.models.probabilistic_filter_mlst_model import (
+    ProbabilisticFilterMlstSchemeModel,
+)
 
 
 @click.group()
@@ -22,10 +32,10 @@ def cli():
 
 
 @cli.command()
-def download_filters():
-    """Download filters."""
-    click.echo("Downloading filters, this may take a while...")
-    download_test_filters("https://xspect2.s3.eu-central-1.amazonaws.com/models.zip")
+def download_models():
+    """Download models."""
+    click.echo("Downloading models, this may take a while...")
+    download_test_models("https://xspect2.s3.eu-central-1.amazonaws.com/models.zip")
 
 
 @cli.command()
@@ -43,7 +53,7 @@ def download_filters():
     help="Sparse sampling step size (e. g. only every 500th kmer for step=500).",
     default=1,
 )
-def classify(genus, path, meta, step):
+def classify_species(genus, path, meta, step):
     """Classify sample(s) from file or directory PATH."""
     click.echo("Classifying...")
     click.echo(f"Step: {step}")
@@ -105,7 +115,7 @@ def classify(genus, path, meta, step):
     help="SVM Sparse sampling step size (e. g. only every 500th kmer for step=500).",
     default=1,
 )
-def train(genus, bf_assembly_path, svm_assembly_path, svm_step):
+def train_species(genus, bf_assembly_path, svm_assembly_path, svm_step):
     """Train model."""
 
     if bf_assembly_path or svm_assembly_path:
@@ -118,6 +128,49 @@ def train(genus, bf_assembly_path, svm_assembly_path, svm_step):
         raise click.ClickException(str(e)) from e
 
 
+@cli.command()
+@click.option(
+    "-c",
+    "--choose_schemes",
+    is_flag=True,
+    help="Choose your own schemes."
+    "Default setting is Oxford and Pasteur scheme of A.baumannii.",
+)
+def train_mlst(choose_schemes):
+    """Download alleles and train bloom filters."""
+    click.echo("Updating alleles")
+    handler = PubMLSTHandler()
+    handler.download_alleles(choose_schemes)
+    click.echo("Download finished")
+    scheme_path = pick_scheme(handler.get_scheme_paths())
+    species_name = str(scheme_path).split("/")[-2]
+    scheme_name = str(scheme_path).split("/")[-1]
+    model = ProbabilisticFilterMlstSchemeModel(
+        31, f"{species_name}:{scheme_name}", get_xspect_model_path()
+    )
+    click.echo("Creating mlst model")
+    model.fit(scheme_path)
+    model.save()
+    click.echo(f"Saved at {model.cobs_path}")
+
+
+@cli.command()
+@click.option(
+    "-p",
+    "--path",
+    help="Path to FASTA-file for mlst identification.",
+    type=click.Path(exists=True, dir_okay=True, file_okay=True),
+)
+def classify_mlst(path):
+    """MLST classify a sample."""
+    click.echo("Classifying...")
+    path = Path(path)
+    scheme_path = pick_scheme_from_models_dir()
+    model = ProbabilisticFilterMlstSchemeModel.load(scheme_path)
+    model.predict(scheme_path, path).save(model.model_display_name, path)
+    click.echo(f"Run saved at {get_xspect_runs_path()}.")
+
+
 @cli.command()
 def api():
     """Open the XspecT FastAPI."""