-
Notifications
You must be signed in to change notification settings - Fork 34
Closed
Milestone
Description
RefSeq transcripts can align with indels and mismatches to the reference sequence. While mismatches could be argued to be non-critical (assuming the GenBank entries that the RefSeq transcript is based on is from healthy individuals), indels cannot.
For hg19, 884 transcripts in 501 genes are affected.
The following solution will be implemented:
- The
default_sources.ini
file gets a settings "fixIndels" and "fixIndelsUcsc". - When parsing the RefSeq transcript database, the
Note
attribute is analyzed.
If it contains the substrings"indel"
or"substitution"
then this is recorded into the builtTranscriptModel
. - When
fixIndels=true
is given then the user also has to provide the path to the reference sequence. - The file at
fixIndelsUcsc
is used for providing the UCSC transcript alignments.
This will be used for the exon and CDS information.
The sequence will be taken from the reference.
NB: This will create an incompatibility between the databases built before and after Jannovar v0.29.
For each hg*/refseq*
entry, a _fixindel
variant is added that contains these fix transcripts. This way, the fixed transcripts are strictly opt-in and only supplement those where the indel is not fixed. Variants for both can be reported.
Metadata
Metadata
Assignees
Labels
No labels