Skip to content

Commit 66cb51c

Browse files
authored
Merge pull request #14 from BIONF/dev
MLST
2 parents 164f12a + d6d68d0 commit 66cb51c

27 files changed

+852
-152
lines changed

.github/workflows/test.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ jobs:
1313
run: |
1414
python -m pip install --upgrade pip
1515
pip install '.[test]'
16-
- name: Download filters
16+
- name: Download models
1717
run: |
18-
xspect download-filters
18+
xspect download-models
1919
- name: Test with pytest
2020
run: |
2121
pytest --cov

README.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<img src="/docs/img/logo.png" height="50%" width="50%">
77

88
<!-- start intro -->
9-
XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or sub-type level using [Bloom Filters] and a [Support Vector Machine]. It also identifies existing [blaOxa-genes] and provides a list of relevant research papers for further information.
9+
XspecT is a Python-based tool to taxonomically classify sequence-reads (or assembled genomes) on the species and/or MLST level using [Bloom Filters] and a [Support Vector Machine].
1010
<br/><br/>
1111

1212
XspecT utilizes the uniqueness of kmers and compares extracted kmers from the input-data to a reference database. Bloom Filter ensure a fast lookup in this process. For a final prediction the results are classified using a Support Vector Machine.
@@ -31,14 +31,14 @@ pip install xspect
3131
Please note that Windows and Alpine Linux is currently not supported.
3232

3333
## Usage
34-
### Get the Bloomfilters
35-
To download basic pre-trained filters, you can use the built-in command:
34+
### Get the models
35+
To download basic pre-trained models, you can use the built-in command:
3636
```
37-
xspect download-filters
37+
xspect download-models
3838
```
39-
Additional species filters can be trained using:
39+
Additional species models can be trained using:
4040
```
41-
xspect train you-ncbi-genus-name
41+
xspect train-species you-ncbi-genus-name
4242
```
4343

4444
### How to run the web app
@@ -50,7 +50,7 @@ xspect api
5050
### How to use the XspecT command line interface
5151
Run xspect with the configuration you want to run it with as arguments.
5252
```
53-
xspect classify your-genus path/to/your/input-set
53+
xspect classify-species your-genus path/to/your/input-set
5454
```
5555
For further instructions on how to use the command line interface, please refer to the [documentation] or execute:
5656
```

docs/cli.md

+25-59
Original file line numberDiff line numberDiff line change
@@ -1,114 +1,80 @@
11
# How to use the CLI
22

3-
XspecT comes with a built-in command line interface (CLI), which enables quick classifications without the need to use the web interface. The command line interface can also be used to download and train filters.
3+
XspecT comes with a built-in command line interface (CLI), which enables quick classifications without the need to use the web interface. The command line interface can also be used to download and train models.
44

55
After installing XspecT, a list of available commands can be viewed by running:
66

77
```bash
88
xspect --help
99
```
1010

11-
## Filter downloads
11+
## Model downloads
1212

13-
A basic set of pre-trained filters (Acinetobacter and Salonella) can be downloaded using the following command:
13+
A basic set of pre-trained models (Acinetobacter and Salonella) can be downloaded using the following command:
1414

1515
```bash
16-
xspect download-filters
16+
xspect download-models
1717
```
1818

19-
For the moment, it is not possible to specify exactly which filters should be downloaded.
19+
For the moment, it is not possible to specify exactly which models should be downloaded.
2020

2121
## Classification
2222

2323
To classify samples, the command
2424

2525
```bash
26-
xspect classify GENUS PATH
26+
xspect classify-species GENUS PATH
2727
```
2828

2929
can be used, when `GENUS` refers to the NCBI genus name of your sample and `PATH` refers to the path to your sample *directory*. This command will classify the species of your sample within the given genus.
3030

3131
The following options are available:
3232

3333
```bash
34-
-s, --species / --no-species Species classification.
35-
-i, --ic / --no-ic IC strain typing.
36-
-o, --oxa / --no-oxa OXA gene family detection.
37-
-m, --meta / --no-meta Metagenome classification.
38-
-c, --complete Use every single k-mer as input for
39-
classification.
40-
-s, --save Save results to csv file.
41-
--help Show this message and exit.
34+
-m, --meta / --no-meta Metagenome classification.
35+
-s, --step INTEGER Sparse sampling step size (e. g. only every 500th
36+
kmer for step=500).
37+
--help Show this message and exit.
4238
```
4339

44-
### Species Classification
40+
To speed up the analysis, only every nth kmer can be considered ("sparse sampling"). For example, to only consider every 10th kmer, run:
4541

46-
Species classification is run by default, without the need for further parameters:
4742
```bash
48-
xspect classify Acinetobacter path
49-
```
50-
51-
Species classification can be toggled using the `-s`/`--species` (`--no-species`) option. To run classification without species classification, the option `--no-species` can be used, for example when running a different analysis:
52-
53-
```bash
54-
xspect classify --no-species -i Acinetobacter path
55-
```
56-
57-
### IC Strain Typing
58-
59-
To perform International Clonal (IC) type classification, the `-i`/`--ic` (`--no-ic`) option can be used:
60-
61-
```bash
62-
xspect classify -i Acinetobacter path
63-
```
64-
65-
Please note that IC strain typing is only available for Acinetobacter baumanii.
66-
67-
### OXA Gene Detection
68-
69-
OXA gene detection can be enabled using the `-o`/`--oxa` (`--no-oxa`) option.
70-
71-
```bash
72-
xspect classify -o Acinetobacter path
43+
xspect classify-species -s 10 Acinetobacter path
7344
```
7445

7546
### Metagenome Mode
7647

7748
To analyze a sample in metagenome mode, the `-m`/`--meta` (`--no-meta`) option can be used:
7849

7950
```bash
80-
xspect classify -m Acinetobacter path
51+
xspect classify-species -m Acinetobacter path
8152
```
8253

83-
Compared to normal XspecT modes, this mode first identifies reads belonging to the given genus and continues classification only with the resulting reads and is thus more suitable for metagenomic samples as the resulting runtime is decreased.
54+
Compared to normal XspecT species classification, this mode first identifies reads belonging to the given genus and continues classification only with the resulting reads, It is thus more suitable for metagenomic samples as the resulting runtime is decreased.
8455

85-
## Filter Training
56+
### MLST Classification
8657

87-
<aside>
88-
⚠️ Depending on genome size and the amount of species, training can take time!
58+
Samples can also be classified based on Multi-locus sequence type schemas. To MLST-classify a sample, run:
8959

90-
</aside>
91-
92-
In order to train filters, please first ensure [Jellyfish](https://github.com/gmarcais/Jellyfish) is installed.
60+
```bash
61+
xspect classify-mlst -p path
62+
```
9363

94-
### NCBI-based filter training
64+
## Model Training
9565

96-
The easiest way to train new filters is to use data from NCBI, which is automatically downloaded and processed by XspecT.
66+
Models can be trained based on data from NCBI, which is automatically downloaded and processed by XspecT.
9767

98-
To train a filter with data from NCBI, run the following command:
68+
To train a model, run the following command:
9969

10070
```bash
101-
xspect train your-ncbi-genus
71+
xspect train-species your-ncbi-genus
10272
```
10373

10474
`you-ncbi-genus` can be a genus name from NCBI or an NCBI taxonomy ID.
10575

106-
### Custom data filter training
107-
108-
XspecT filters can also be trained using custom data, which need to be provided as a folder for both filter and SVM training. The provided assembly files need to be in FASTA format and their names should be the species ID and the species name, for example `28901_enterica.fasta`. While the ID can be arbitrary, the standard is NCBI taxon IDs.
109-
110-
The filters can then be trained using:
76+
To train models for MLST classifications, run:
11177

11278
```bash
113-
xspect train -bf-path directory/1 -svm-path directory/2
79+
xspect train-mlst
11480
```

docs/input_data.md

+1-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,2 @@
11
# Input Data
2-
XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).
3-
4-
The amount of reads that will be used has to be set by the user when using sequence-reads. The minimum amount is 5000 reads for species classification and 500 reads for sub-type classification. The maximum number of reads is limited by the browser and is usually around ~8 million reads. Using more reads will lead to a increased runtime (xsec./1mio reads).
2+
XspecT is able to use either raw sequence-reads (FASTQ-format .fq/.fastq) or already assembled genomes (FASTA-format .fasta/.fna). Using sequence-reads saves up the assembly process but high-quality reads with a low error-rate are needed (e.g. Illumina-reads).

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "XspecT"
3-
version = "0.2.5"
3+
version = "0.2.6"
44
description = "Tool to monitor and characterize pathogens using Bloom filters."
55
readme = {file = "README.md", content-type = "text/markdown"}
66
license = {file = "LICENSE"}

src/xspect/definitions.py

+7
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,10 @@ def get_xspect_runs_path():
4040
runs_path = get_xspect_root_path() / "runs"
4141
runs_path.mkdir(exist_ok=True, parents=True)
4242
return runs_path
43+
44+
45+
def get_xspect_mlst_path():
46+
"""Return the path to the XspecT runs directory."""
47+
mlst_path = get_xspect_root_path() / "mlst"
48+
mlst_path.mkdir(exist_ok=True, parents=True)
49+
return mlst_path

src/xspect/download_filters.py renamed to src/xspect/download_models.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
from xspect.definitions import get_xspect_model_path, get_xspect_tmp_path
88

99

10-
def download_test_filters(url):
11-
"""Download filters."""
10+
def download_test_models(url):
11+
"""Download models."""
1212

1313
download_path = get_xspect_tmp_path() / "models.zip"
1414
extract_path = get_xspect_tmp_path() / "extracted_models"

src/xspect/fastapi.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from shutil import copyfileobj
66
from fastapi import FastAPI, UploadFile, BackgroundTasks
77
from xspect.definitions import get_xspect_runs_path, get_xspect_upload_path
8-
from xspect.download_filters import download_test_filters
8+
from xspect.download_models import download_test_models
99
import xspect.model_management as mm
1010
from xspect.models.result import StepType
1111
from xspect.pipeline import ModelExecution, Pipeline, PipelineStep
@@ -17,7 +17,7 @@
1717
@app.get("/download-filters")
1818
def download_filters():
1919
"""Download filters."""
20-
download_test_filters("https://xspect2.s3.eu-central-1.amazonaws.com/models.zip")
20+
download_test_models("https://xspect2.s3.eu-central-1.amazonaws.com/models.zip")
2121

2222

2323
@app.get("/classify")

src/xspect/main.py

+61-8
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,23 @@
66
import click
77
import uvicorn
88
from xspect import fastapi
9-
from xspect.download_filters import download_test_filters
9+
from xspect.download_models import download_test_models
1010
from xspect.train import train_ncbi
1111
from xspect.models.result import (
1212
StepType,
1313
)
14-
from xspect.definitions import get_xspect_runs_path, fasta_endings, fastq_endings
14+
from xspect.definitions import (
15+
get_xspect_runs_path,
16+
fasta_endings,
17+
fastq_endings,
18+
get_xspect_model_path,
19+
)
1520
from xspect.pipeline import ModelExecution, Pipeline, PipelineStep
21+
from xspect.mlst_feature.mlst_helper import pick_scheme, pick_scheme_from_models_dir
22+
from xspect.mlst_feature.pub_mlst_handler import PubMLSTHandler
23+
from xspect.models.probabilistic_filter_mlst_model import (
24+
ProbabilisticFilterMlstSchemeModel,
25+
)
1626

1727

1828
@click.group()
@@ -22,10 +32,10 @@ def cli():
2232

2333

2434
@cli.command()
25-
def download_filters():
26-
"""Download filters."""
27-
click.echo("Downloading filters, this may take a while...")
28-
download_test_filters("https://xspect2.s3.eu-central-1.amazonaws.com/models.zip")
35+
def download_models():
36+
"""Download models."""
37+
click.echo("Downloading models, this may take a while...")
38+
download_test_models("https://xspect2.s3.eu-central-1.amazonaws.com/models.zip")
2939

3040

3141
@cli.command()
@@ -43,7 +53,7 @@ def download_filters():
4353
help="Sparse sampling step size (e. g. only every 500th kmer for step=500).",
4454
default=1,
4555
)
46-
def classify(genus, path, meta, step):
56+
def classify_species(genus, path, meta, step):
4757
"""Classify sample(s) from file or directory PATH."""
4858
click.echo("Classifying...")
4959
click.echo(f"Step: {step}")
@@ -105,7 +115,7 @@ def classify(genus, path, meta, step):
105115
help="SVM Sparse sampling step size (e. g. only every 500th kmer for step=500).",
106116
default=1,
107117
)
108-
def train(genus, bf_assembly_path, svm_assembly_path, svm_step):
118+
def train_species(genus, bf_assembly_path, svm_assembly_path, svm_step):
109119
"""Train model."""
110120

111121
if bf_assembly_path or svm_assembly_path:
@@ -118,6 +128,49 @@ def train(genus, bf_assembly_path, svm_assembly_path, svm_step):
118128
raise click.ClickException(str(e)) from e
119129

120130

131+
@cli.command()
132+
@click.option(
133+
"-c",
134+
"--choose_schemes",
135+
is_flag=True,
136+
help="Choose your own schemes."
137+
"Default setting is Oxford and Pasteur scheme of A.baumannii.",
138+
)
139+
def train_mlst(choose_schemes):
140+
"""Download alleles and train bloom filters."""
141+
click.echo("Updating alleles")
142+
handler = PubMLSTHandler()
143+
handler.download_alleles(choose_schemes)
144+
click.echo("Download finished")
145+
scheme_path = pick_scheme(handler.get_scheme_paths())
146+
species_name = str(scheme_path).split("/")[-2]
147+
scheme_name = str(scheme_path).split("/")[-1]
148+
model = ProbabilisticFilterMlstSchemeModel(
149+
31, f"{species_name}:{scheme_name}", get_xspect_model_path()
150+
)
151+
click.echo("Creating mlst model")
152+
model.fit(scheme_path)
153+
model.save()
154+
click.echo(f"Saved at {model.cobs_path}")
155+
156+
157+
@cli.command()
158+
@click.option(
159+
"-p",
160+
"--path",
161+
help="Path to FASTA-file for mlst identification.",
162+
type=click.Path(exists=True, dir_okay=True, file_okay=True),
163+
)
164+
def classify_mlst(path):
165+
"""MLST classify a sample."""
166+
click.echo("Classifying...")
167+
path = Path(path)
168+
scheme_path = pick_scheme_from_models_dir()
169+
model = ProbabilisticFilterMlstSchemeModel.load(scheme_path)
170+
model.predict(scheme_path, path).save(model.model_display_name, path)
171+
click.echo(f"Run saved at {get_xspect_runs_path()}.")
172+
173+
121174
@cli.command()
122175
def api():
123176
"""Open the XspecT FastAPI."""

src/xspect/mlst_feature/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)