Skip to content

Commit 7f666b7

Browse files
committed
Squashed commit of the following:
commit 3f2929b Merge: 7fb439f 959e0a0 Author: Hannah Muelbaier <[email protected]> Date: Mon Oct 23 15:06:29 2023 +0200 Merge pull request #2 from BIONF/master deactivate fas config output commit 7fb439f Author: mueli94 <[email protected]> Date: Fri Oct 20 10:05:26 2023 +0200 Revert "Merge branch 'fdog-assembly' of https://github.com/mueli94/fDOG-Assembly into fdog-assembly" This reverts commit d688bbb, reversing changes made to 5068b7a. commit d0d359f Merge: 5068b7a d688bbb Author: mueli94 <[email protected]> Date: Fri Oct 20 10:05:01 2023 +0200 Merge branch 'fdog-assembly' of https://github.com/mueli94/fDOG-Assembly into fdog-assembly commit d688bbb Merge: 5068b7a 3b62945 Author: mueli94 <[email protected]> Date: Thu Oct 19 16:28:55 2023 +0200 Merge branch 'fdog-assembly' of https://github.com/mueli94/fDOG-Assembly into fdog-assembly commit 5068b7a Author: mueli94 <[email protected]> Date: Thu Oct 19 16:20:18 2023 +0200 contig fasta files where delited from the tmp folder to save memory commit 3b62945 Author: Hannah <[email protected]> Date: Thu Sep 21 16:31:17 2023 +0200 further updates to adapt fDOG-Assembly to the new fDOG version. Additionally bugfix in co-ortholog detection if MSA crashed due to too many sequences. commit 41660e6 Author: Hannah <[email protected]> Date: Wed Sep 20 10:34:56 2023 +0200 fixed augustus version commit 5d77522 Author: mueli94 <[email protected]> Date: Tue Sep 19 15:02:57 2023 +0200 Bugfix muscle v5 command in fdog.addCoreGroup commit 959e0a0 Author: trvinh <[email protected]> Date: Tue Sep 19 14:49:48 2023 +0200 deactivate fas config output commit ea40a0c Merge: 3140fc4 3cc809c Author: Hannah <[email protected]> Date: Mon Sep 18 17:27:04 2023 +0200 Merge branch 'fdog-assembly' of https://github.com/mueli94/fDOG-Assembly into fdog-assembly commit 3140fc4 Author: Hannah <[email protected]> Date: Mon Sep 18 17:25:39 2023 +0200 script to produce msa and hmm in the format fDOG-Assembly requires from a fasta file commit 3cc809c Author: Hannah <[email protected]> Date: Mon Sep 18 17:21:07 2023 +0200 script to produce msa and hmm in the format fDOG-Assembly requires from a fasta file commit ef5acb7 Author: Hannah <[email protected]> Date: Mon Sep 18 17:20:09 2023 +0200 further changes to adapt to new fDOG version commit a832824 Author: Hannah <[email protected]> Date: Mon Sep 18 13:17:11 2023 +0200 adjustments to new muscle version and fDOG version commit 8df8a73 Author: Hannah <[email protected]> Date: Mon Sep 18 11:38:06 2023 +0200 added fDOG-Assembly dependencies commit fc0ea45 Author: Hannah <[email protected]> Date: Mon Sep 18 10:58:18 2023 +0200 added fDOG-Assembly workflow commit 5ccf824 Author: trvinh <[email protected]> Date: Wed Aug 16 13:33:40 2023 +0200 check valid rank for refspec only commit deeda8a Author: Vinh Tran <[email protected]> Date: Wed Aug 16 13:19:31 2023 +0200 Update CHANGELOG.md commit eb902a6 Author: trvinh <[email protected]> Date: Wed Aug 16 12:03:36 2023 +0200 option to not added all taxa commit e6ce3d1 Author: trvinh <[email protected]> Date: Wed Aug 16 11:50:10 2023 +0200 changed orthID NA to fdogMA for manually added taxa commit 5cb0946 Author: trvinh <[email protected]> Date: Wed Aug 16 09:57:26 2023 +0200 added all searchTaxa to phyloprofile commit 3e0859b Author: trvinh <[email protected]> Date: Tue Aug 15 15:39:54 2023 +0200 added check for invalid min-/maxDist; works with fasta36.3.8g commit b7ec6fa Author: trvinh <[email protected]> Date: Tue Aug 15 15:19:36 2023 +0200 fixed worrking with old fasta36 commit 553cdaa Author: trvinh <[email protected]> Date: Tue Aug 15 12:44:18 2023 +0200 added check for invalid min-/maxDist; works with fasta36.3.8g commit c756ab0 Author: trvinh <[email protected]> Date: Tue Aug 15 11:48:59 2023 +0200 added check for invalid min-/maxDist; works with fasta36.3.8g commit a746c8f Author: trvinh <[email protected]> Date: Tue Aug 15 11:32:45 2023 +0200 added check for invalid min-/maxDist; works with fasta36.3.8g commit 48372cf Author: trvinh <[email protected]> Date: Tue Jul 18 11:11:37 2023 +0200 added warning for no hmm hit commit a8541f8 Author: trvinh <[email protected]> Date: Thu Jun 15 16:24:24 2023 +0200 set seqlen limit for muscle5 commit b06b3a1 Author: trvinh <[email protected]> Date: Mon Jun 12 14:09:09 2023 +0200 fixed mapping file for addTaxon commit c5ca07c Author: trvinh <[email protected]> Date: Thu May 25 14:13:45 2023 +0200 fixed output filename for muscle5 commit b014157 Author: trvinh <[email protected]> Date: Thu May 11 13:13:19 2023 +0200 check empty tmp dir before deleting in addTaxon commit 052cf70 Author: trvinh <[email protected]> Date: Fri Mar 10 15:42:15 2023 +0100 muscle5,checkBlast,ignoreAnnoCheck commit 7e6bfbc Author: trvinh <[email protected]> Date: Wed Feb 22 09:57:23 2023 +0100 adapt evalue commit 56995b1 Author: trvinh <[email protected]> Date: Wed Feb 22 09:56:40 2023 +0100 adapt evalue commit 7798d15 Author: trvinh <[email protected]> Date: Mon Feb 20 13:50:31 2023 +0100 rename log fdogs.run commit 03c9038 Author: trvinh <[email protected]> Date: Fri Feb 17 14:29:03 2023 +0100 fix number of hmmhits core compilation commit f7c4cf8 Author: trvinh <[email protected]> Date: Tue Feb 14 12:47:09 2023 +0100 fix check for pathconfig file commit 5785980 Author: trvinh <[email protected]> Date: Tue Feb 14 11:14:23 2023 +0100 fix sorting hmm hits commit bb185a3 Author: trvinh <[email protected]> Date: Mon Feb 6 14:14:52 2023 +0100 add several options to work with data paths; resolves: #28 commit 8bd2359 Author: trvinh <[email protected]> Date: Mon Feb 6 14:04:01 2023 +0100 add several options to work with data paths; resolves: #28 commit 7c3191a Author: trvinh <[email protected]> Date: Mon Feb 6 13:59:31 2023 +0100 add several options to work with data paths commit ab5491b Author: trvinh <[email protected]> Date: Fri Feb 3 15:46:46 2023 +0100 option to use different names for data folders commit 198393d Author: trvinh <[email protected]> Date: Fri Feb 3 15:07:49 2023 +0100 add option to update json file to checkData commit 4339bad Author: trvinh <[email protected]> Date: Thu Feb 2 17:17:34 2023 +0100 accept old folder names commit bfbc324 Author: trvinh <[email protected]> Date: Thu Feb 2 16:39:34 2023 +0100 added fn desc commit ad8741d Author: trvinh <[email protected]> Date: Thu Feb 2 10:17:29 2023 +0100 v0.1.5 version bump commit ad2cbbd Author: trvinh <[email protected]> Date: Thu Feb 2 10:06:53 2023 +0100 sort hmm hits using domain scores; correct tree walking; setup force not delete data; correct profile output with fasOff commit 206def0 Author: trvinh <[email protected]> Date: Thu Feb 2 10:04:18 2023 +0100 sort hmm hits using domain scores; correct tree walking; setup force not delete data; correct profile output with fasOff commit 1edad42 Author: trvinh <[email protected]> Date: Fri Jan 27 09:33:58 2023 +0100 version bump commit 8855db4 Author: trvinh <[email protected]> Date: Fri Jan 27 09:33:38 2023 +0100 reduce evalue for identify seed ID commit 8332690 Author: trvinh <[email protected]> Date: Thu Jan 26 14:48:27 2023 +0100 fix bug identify seed ID with reuseCore commit 9edaf9e Author: trvinh <[email protected]> Date: Thu Jan 26 10:47:14 2023 +0100 speed up preparation in fdogs.run commit 2405698 Author: trvinh <[email protected]> Date: Thu Jan 26 10:24:23 2023 +0100 save core jobs for fdogs.run to file commit e728252 Author: trvinh <[email protected]> Date: Thu Jan 26 09:51:00 2023 +0100 fix path to refspec fa; add amino to hmmbuild commit f984b26 Author: trvinh <[email protected]> Date: Wed Jan 25 14:41:09 2023 +0100 added preparation runtime to fdogs.run commit 8624f7d Author: trvinh <[email protected]> Date: Tue Jan 24 14:23:03 2023 +0100 fixed kimura distance devide by 0 commit dd9a45c Author: trvinh <[email protected]> Date: Tue Jan 24 10:57:38 2023 +0100 updated README commit 055051f Author: Vinh Tran <[email protected]> Date: Tue Jan 24 10:21:08 2023 +0100 0.1.0 (#26) * create v0.1.0 * simplfying hamstr.pl * first python conversion * modified addTaxon and addTaxa * added install/check dependencies * rename data folders * added runtime for core complilation commit d8eea1f Author: trvinh <[email protected]> Date: Wed Oct 12 09:10:56 2022 +0200 fixed addTaxon replacing pipe commit 20aa839 Author: trvinh <[email protected]> Date: Mon Jun 13 10:19:40 2022 +0200 removed fdog assembly files commit a0747e0 Author: trvinh <[email protected]> Date: Tue Mar 15 14:19:32 2022 +0100 fdogs accepts dots in seed filename
1 parent b1a25aa commit 7f666b7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+3779
-8271
lines changed

.DS_Store

-6 KB
Binary file not shown.

.github/workflows/github_build.yml

+20-5
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@ name: build
55

66
on:
77
push:
8-
branches: [ master ]
8+
branches:
9+
- master
10+
- 0.1.0
911
tags:
10-
- '*'
12+
- '*'
1113
pull_request:
1214
branches: [ master ]
1315

@@ -38,10 +40,23 @@ jobs:
3840
run: |
3941
pwd
4042
pip install .
41-
fdog.setup -o /home/runner/work/fDOG/fDOG/dt --lib
42-
fdog.setup -o /home/runner/work/fDOG/fDOG/dt
43+
path=$(fdog.setup -d ./ --getSourcepath); for i in $(less $path/data/dependencies.txt); do sudo apt-get install -y -qq $i; done
44+
echo "TEST fdog.setup"
45+
fdog.setup -d /home/runner/work/fDOG/fDOG/dt --woFAS
46+
echo "TEST fdog.checkData"
47+
fdog.checkData -s /home/runner/work/fDOG/fDOG/dt/searchTaxa_dir -c /home/runner/work/fDOG/fDOG/dt/coreTaxa_dir -a /home/runner/work/fDOG/fDOG/dt/annotation_dir --reblast
48+
echo "TEST fdog.showTaxa"
4349
fdog.showTaxa
44-
fdog.run --seqFile infile.fa --seqName test --refspec HUMAN@9606@3 --fasoff
50+
echo "TEST fdog.run"
51+
fdog.run --seqFile infile.fa --jobName test --refspec HUMAN@9606@3 --fasOff --group mammalia
52+
mkdir seeds
53+
path=$(fdog.setup -d ./ --getSourcepath); a="1 2 3"; for i in ${a[@]}; do cp $path/data/infile.fa seeds/$i.fa; done
54+
echo "TEST fdogs.run"
55+
fdogs.run --seqFolder seeds --jobName test_multi --refspec HUMAN@9606@3 --fasOff --searchTaxa PARTE@5888@3,THAPS@35128@3
56+
echo "TEST fdog.addTaxon"
57+
head /home/runner/work/fDOG/fDOG/dt/searchTaxa_dir/HUMAN@9606@3/HUMAN@[email protected] > hm.fa
58+
fdog.addTaxon -f hm.fa -i 9606 -o ./ -c -a
59+
ls
4560
- name: Deploy
4661
if: startsWith(github.event.ref, 'refs/tags')
4762
uses: casperdcl/deploy-pypi@v2

CHANGELOG.md

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Changelog
2+
3+
## [Dev version]
4+
5+
### Added
6+
-
7+
### Changed
8+
-
9+
### Fixed
10+
-
11+
12+
## [0.1.23] - 2023.08.16
13+
14+
### Added
15+
- Option to NOT adding all search taxa (`--notAddingTaxa`); OFF by default,
16+
i.e. all search taxa will be present in phyloprofile output
17+
- Check invalid min-/max rank for referece species (specified by --minDist and --maxDist).
18+
If the specified ranks (by default, `--minDist genus` `--maxDist kingdom`) are not available,
19+
the next valid ranks will be suggested (or automatically applied if default ranks are used)
20+
- Check if seed sequence cannot be retrieved by blast. Return with the blast command
21+
22+
### Changed
23+
-
24+
### Fixed
25+
- Fixed issue with long directory path for FASTA36 v36.3.8g Dec 2017
26+
27+
## [0.1.12] - 2023.03.10
28+
29+
### Added
30+
- Option to not check annotations for fdog.checkData (option `--ignoreAnno`)
31+
- Check compatibility between blastp and blast DBs
32+
33+
### Changed
34+
- Work with MUSCLE v5.1
35+
- Replace MuscleCommandline and MafftCommandline by subprocess.run
36+
37+
### Fixed
38+
-

README.md

+8-14
Original file line numberDiff line numberDiff line change
@@ -44,30 +44,24 @@ export PATH=$HOME/.local/bin:$PATH
4444

4545
After installing *fdog*, you need to setup *fdog* to get its dependencies and pre-calculated data.
4646

47-
**NOTE**: in case you haven't installed [greedyFAS](https://github.com/BIONF/FAS) before, it will be installed automatically within *fDOG* setup. However, you need to run [setupFAS](https://github.com/BIONF/FAS/wiki/setupFAS) after *fDOG* setup finished before actually using *fDOG*!
47+
**NOTE**: in case you haven't installed [greedyFAS](https://github.com/BIONF/FAS), it will be installed automatically within *fDOG* setup. However, you need to run [setupFAS](https://github.com/BIONF/FAS/wiki/setupFAS) after *fDOG* setup finished before actually using *fDOG*!
4848

4949
You can setup fDOG by running this command
5050
```
51-
fdog.setup -o /output/path/for/fdog/data
51+
fdog.setup -d /output/path/for/fdog/data
5252
```
53-
or, in case you are using Anaconda
54-
```
55-
fdog.setup -o /output/path/for/fdog/data --conda
56-
```
57-
58-
*You should have the sudo password ready, otherwise some missing dependencies cannot be installed. See [dependency list](#dependencies) for more info. If you do not have root privileges, ask your admin to install those dependencies using `fdog.setup --lib` command.*
5953

6054
[Pre-calculated data set](https://github.com/BIONF/fDOG/wiki/Input-and-Output-Files#data-structure) of fdog will be saved in `/output/path/for/fdog/data`. After the setup run successfully, you can start using *fdog*. **Please make sure to check if you need to run [setupFAS](https://github.com/BIONF/FAS/wiki/setupFAS) first.**
6155

6256
You will get a warning if any of the dependencies are not ready to use, please solve those issues and rerun `fdog.setup`.
6357

64-
*For debugging the setup, please create a log file by running the setup as e.g. `fdog.setup | tee log.txt` for Linux/MacOS or `fdog.setup --conda | tee log.txt` for Anaconda and send us that log file, so that we can trouble shoot the issues. Most of the problems can be solved by just re-running the setup.*
58+
*For debugging the setup, please create a log file by running the setup as e.g. `fdog.setup | tee log.txt` and send us that log file, so that we can trouble shoot the issues. Most of the problems can be solved by just re-running the setup.*
6559

6660
# Usage
6761
*fdog* will run smoothly with the provided sample input file 'infile.fa' if everything is set correctly.
6862

6963
```
70-
fdog.run --seqFile infile.fa --seqName test --refspec HUMAN@9606@3
64+
fdog.run --seqFile infile.fa --jobName test --refspec HUMAN@9606@3
7165
```
7266
The output files with the prefix `test` will be saved at your current working directory.
7367
You can have an overview about all available options with the command
@@ -81,9 +75,9 @@ Please find more information in [our wiki](https://github.com/BIONF/fDOG/wiki) t
8175

8276
Within the data package we provide a set of 78 reference taxa. They can be automatically downloaded during the setup. This data comes "ready to use" with the *fdog* framework. Species data must be present in the three directories listed below:
8377

84-
* genome_dir (Contains sub-directories for proteome fasta files for each species)
85-
* blast_dir (Contains sub-directories for BLAST databases made with `makeblastdb` out of your proteomes)
86-
* weight_dir (Contains feature annotation files for each proteome)
78+
* searchTaxa_dir (Contains sub-directories for proteome fasta files for each species)
79+
* coreTaxa_dir (Contains sub-directories for BLAST databases made with `makeblastdb` out of your proteomes)
80+
* annotation_dir (Contains feature annotation files for each proteome)
8781

8882
For each species/taxon there is a sub-directory named in accordance to the naming schema ([Species acronym]@[NCBI ID]@[Proteome version])
8983

@@ -95,7 +89,7 @@ For adding **one gene set**, please use the `fdog.addTaxon` function:
9589
fdog.addTaxon -f newTaxon.fa -i tax_id [-o /output/directory] [-n abbr_tax_name] [-c] [-v protein_version] [-a]
9690
```
9791

98-
in which, the first 3 arguments are required including `newTaxon.fa` is the gene set that need to be added, `tax_id` is its NCBI taxonomy ID, `/output/directory` is where the sub-directories can be found (*genome_dir*, *blast_dir* and *weight_dir*). If not given, new taxon will be added into the same directory of pre-calculated data. Other arguments are optional, which are `-n` for specify your own taxon name (if not given, an abbriviate name will be suggested based on the NCBI taxon name of the input `tax_id`), `-c` for calculating the BLAST DB (only needed if you need to include your new taxon into the list of taxa for compilating the core set), `-v` for identifying the genome/proteome version (default will be 1), and `-a` for turning off the annotation step (*not recommended*).
92+
in which, the first 3 arguments are required including `newTaxon.fa` is the gene set that need to be added, `tax_id` is its NCBI taxonomy ID, `/output/directory` is where the sub-directories can be found (*genome_dir*, *blast_dir* and *weight_dir*). If not given, new taxon will be added into the same directory of pre-calculated data. Other arguments are optional, which are `-n` for specify your own taxon name (if not given, an abbriviate name will be suggested based on the NCBI taxon name of the input `tax_id`), `-c` for calculating the BLAST DB (only needed if you need to include your new taxon into the list of taxa for compilating the core set), `-v` for identifying the genome/proteome version (default will be the current date <YYMMDD>), and `-a` for turning off the annotation step (*not recommended*).
9993

10094
## Adding a list of gene sets into fDOG
10195
For adding **more than one gene set**, please use the `fdog.addTaxa` script:

fdog/.DS_Store

-8 KB
Binary file not shown.

0 commit comments

Comments
 (0)