This repository contains several small scripts I use in various steps of my genomics analyses. Mostly they are highly personalized for my own needs and I do not provide warranty for their functionality. In this README you will find a brief description of each script.
Filters assemblies based on GC content, proportion of blast hit with particular taxonomic identity and mean read coverage. Helps to identfy contigs that are potential contaminants. This is somewhat similar to Blobology approaches.
Used non-standard Libraries: BioPython, scipy, numpy
Calculated GC content of DNA Sequences (e.g. Assemblies) using a sliding window approach. It creates output as PDF files with graphical representations of the GC content of the sequence. Now also includes gene position information.
Used non-standard Libraries: BioPython, matplotlib
Identifies longest splicing isoforms from Trinity asssemblies.
Used non-standard Libraries: BioPython
Convert GFF3 (eg. from MAKER) files to GTF
Uses Codon Alignments obtained from set of DNA and Protein Sequences to calculate dN/dS.
Used non-standard Libraries: BioPython, numpy
Filters BLAST hits in tabular format.
Used non-standard Libraries: pandas
Filters output from codeml.py according to pvalue and number of sites under selection.
Uses OrthoMCL or blast all vs. all output to select sequences belonging to orthologous groups from multiple species.
Used non-standard Libraries: pandas, BioPython
Reduces FASTA files to IDs from another FASTA file.
Used non-standard Libraries: BioPython
Reduced a FASTA file to the IDS from dbCAN output according to a specific group of CAZymes
Used non-standard Libraries: pandas, BioPython
This script creates a valid xml blast file (-outfmt 5) from a piped concatenated xml file when blast ist used with parallel.
This script converts a blast xml file (-outfmt 5) to tabular (-outfmt 6)
Used non-standard Libraries: BioPython
Converts the output of a Blast2GO BLAST search (several zipped xml files) to a single blast tabular format (-outfmt 6) file.
Downloads information of characterized CAZymes from cazy.org in tab delimited format for downstream analyses.
Downloads and parses lineage information from NCBI. This was used in early versions of phylociraptor.
Extracts smaller parts of a sequence from a longer one in FASTA format.