Releases: pachterlab/gget
Releases · pachterlab/gget
v0.29.1 - mutate and cosmic overhaul
gget mutate
:- gget mutate has been simplified to focus on taking as input a list of mutations and associated reference genome with corresponding annotation information, and produce as output the sequences with the mutation incorporated and a short region of surrounding context. For the full functionality of the previous version and how it integrates in the context of a novel variant screening pipeline, visit the varseek repository being developed by members of the gget team at https://github.com/pachterlab/varseek.git.
- Added additional information to returned data frames as described here: #169
gget cosmic
:- Major restructuring of the
gget cosmic
module to adhere to new login requirements set by COSMIC - New arguments
email
andpassword
were added to allow the user to manually enter their login credentials without required input for data download - Default changed:
gget_mutate=False
- Deprecated argument:
entity
- Argument
mutation_class
is nowcosmic_project
- Major restructuring of the
gget bgee
:type="orthologs"
is now the default, removing the need to specify thetype
argument when calling orthologs- Allow querying multiple genes at once.
gget diamond
:- Now supports translated alignment of nucleotide sequences to amino acid reference sequences using the
--translated
flag.
- Now supports translated alignment of nucleotide sequences to amino acid reference sequences using the
gget elm
:- Improved server error handling.
v0.29.0 - cbio, opentargets, bgee and more
- New modules:
gget enrichr
now also supports species other than human and mouse (fly, yeast, worm, and fish) via modEnrichRgget mutate
:
gget mutate
will now merge identical sequences in the final file by default. Mutation creation was vectorized to decrease runtime. Improved flanking sequence check for non-substitution mutations to make sure no wildtype kmer is retained in the mutation-containing sequence. Addition of several new arguments to customize sequence generation and output.gget cosmic
:
Added support for targeted as well as gene screens. The CSV file created for gget mutate now also contains protein mutation info.gget ref
:
Added out file option.gget info
andgget seq
:
Switched to Ensembl POST API to increase speed (nothing changes in front end).- Other "behind the scenes" changes:
- Unit tests reorganized to increase speed and decrease code
- Requirements updated to allow newer mysql-connector versions
- Support Numpy>= 2.0
v0.28.6 - gget mutate, download_cosmic, fixes for Ensembl v112
- New module:
gget mutate
gget cosmic
: You can now download entire COSMIC databases using the argumentdownload_cosmic
argumentgget ref
: Can now fetch the GRCh37 genome assembly usingspecies='human_grch37'
gget search
: Adjust access of human data to the structure of Ensembl release 112 (fixes issue 129)
v0.28.4 - Fix Windows bug in gget elm setup
Fix Windows bug in gget elm setup
v0.28.3 - cosmic, invertebrates for ref and search, elm improvements
gget search
andgget ref
now also support fungi 🍄, protists 🌝, and invertebrate metazoa 🐝 🐜 🐌 🐙 (in addition to vertebrates and plants)- New module:
gget cosmic
gget enrichr
: Fix duplicate scatter dots in plot when pathway names are duplicatedgget elm
:- Changed ortho results column name 'Ortholog_UniProt_ID' to 'Ortholog_UniProt_Acc' to correctly reflect the column contents, which are UniProt Accessions. 'UniProt ID' was changed to 'UniProt Acc' in the documentation for all
gget
modules. - Changed ortho results column name 'motif_in_query' to 'motif_inside_subject_query_overlap'.
- Added interaction domain information to results (new columns: "InteractionDomainId", "InteractionDomainDescription", "InteractionDomainName").
- The regex string for regular expression matches was encapsulated as follows: "(?=(regex))" (instead of directly passing the regex string "regex") to enable capturing all occurrences of a motif when the motif length is variable and there are repeats in the sequence (https://regex101.com/r/HUWLlZ/1).
- Changed ortho results column name 'Ortholog_UniProt_ID' to 'Ortholog_UniProt_Acc' to correctly reflect the column contents, which are UniProt Accessions. 'UniProt ID' was changed to 'UniProt Acc' in the documentation for all
gget setup
: Use theout
argument to specify a directory the ELM database will be downloaded into. Completes this feature request.gget diamond
: The DIAMOND command is now run with--ignore-warnings
flag, allowing niche sequences such as amino acid sequences that only contain nucleotide characters and repeated sequences. This is also true for DIAMOND alignments performed withingget elm
.gget ref
andgget search
back-end change: the current Ensembl release is fetched from the new release file on the Ensembl FTP site to avoid errors during uploads of new releases.gget search
:- FTP link results (
--ftp
) are saved in txt file format instead of json. - Fix URL links to Ensembl gene summary for species with a subspecies name and invertebrates.
- FTP link results (
gget ref
:- Back-end changes to increase speed
- New argument:
list_iv_species
to list all available invertebrate species (can be combined with therelease
argument to fetch all species available from a specific Ensembl release)
v0.28.2 - NCBI server issues and gget elm expand
gget info
: Return a logging error message when the NCBI server fails for a reason other than a fetch fail (this is an error on the server side rather than an error withgget
)- Replace deprecated 'text' argument to find()-type methods whenever used with dependency
BeautifulSoup
gget elm
: Remove false positive and true negative instances from returned resultsgget elm
: Addexpand
argument
v0.28.0 - gget elm + gget diamond
- Updated documentation of
gget muscle
to add a tutorial on how to visualize sequences with sequence name lengths + slight change to returned visualization so it's a bit more robust to varying sequence names gget muscle
now also allows a list of sequences as input (as an alternative to providing the path to a FASTA file)- Allow missing gene filter for
gget cellxgene
(fixes bug) gget seq
: Allow missing gene names (fixes #107)- New arguments for
gget enrichr
: Use argumentskegg_out
andkegg_rank
to create an image of the KEGG pathway with the genes from the enrichment analysis highlighted (thanks to this PR by Noriaki Sato) - New modules:
gget elm
andgget diamond
co-authored-by: @anhchi172
v0.27.9 - gget enrichr background genes
v0.27.8 - Fixed bug in gget pdb; add release argument to gget search
- Fixed bug in gget pdb
- Added new
release
argument to gget search
Also see: https://pachterlab.github.io/gget/updates.html
Co-contributor: @anhchi172
v0.27.7 - Cleaned up requirements; gget alphafold compatibility with Python>=3.10
Moved dependencies for modules gget gpt and gget cellxgene from automatically installed requirements to gget setup.
Updated gget alphafold dependencies for compatibility with Python >= 3.10.
Added census_version argument to gget cellxgene.