Description
Inspired by @FedeGueli's comment about an insertion at S:248 having occurred in vitro, I had a look for insertions near this site, and it turns out they are reasonably frequent. The most common is a 9nt out-of-frame insertion nuc:22304insCAGGGGAGA, resulting in what I suppose would properly be called S:Y248SGEN. Nextclade and occasionally GISAID report it as S:247insSGE and S:Y248N, whereas GISAID usually reports it as S:248insGEN and S:Y248S.
This particular insertion seems to have arisen several times independently (see later**). The biggest cluster is a sublineage of BA.2+C22792T, and looks like it may co-occur with ORF1a:C655R = nuc:T2228C, which should help with designation. This is the one I'm proposing here.
As usual with insertions, this one often doesn't get called, so it's hard to tell how big the lineage really is. Usher splits BA.2+T2228C+C22792T into one big branch and numerous smaller ones; I took a look at a few of the small ones and they either contain sequences with the insertion or share other mutations with some sequences on the big branch. So it looks like most of BA.2+T2228C+C22792T (cov-spectrum) is likely to be this lineage. There are a couple of BA.2.9+T2228C+C22792T which I've excluded from the query.
The cov-spectrum query gives 190 sequences from over a dozen countries, including Italy (49), Denmark (34), UK (24), France (23), USA (20), Germany (14). The first sequence is from Italy (2022-02-16) and the lineage is most prevalent there, making up about 2% of the most recent sequences (Italy sequences much less than the other countries listed above).
Numbers are small and error bars are large but there is an apparent growth advantage over a BA.2* baseline in Italy.
**In addition to the T2228C lineage, the exact same 9nt insertion occurs in:
- A BA.2 sublineage with T28160C+T12013A (~36 seq, mainly Northern Ireland)
- A BA.2.9 sublineage with C28435T+C7471T+T14384G (~9 seq, Germany, Italy, Denmark)