VK545 Ancient DNA Revisited

This forum is for general discussion about the Dál Cuinn.
Post Reply
User avatar
Ollam
Site Admin
Posts: 1188
Joined: Wed, 2019-Jun-26 2:47 pm

VK545 Ancient DNA Revisited

Post by Ollam »

This is a follow-on to the Population Genomics of the Viking World thread.

As best as I can determine the VK545 sample from the Ship Street, Dublin, Ireland site was aligned and analyzed using the GRCh37/hg19 reference. YFull shows this sample as being R1b-A225 under R1b-A223. However, GRCh37/hg19 is not the best reference.
[ https://www.yfull.com/branch-info/R-Y3646/#t4-tab ]

Given the enhanced coverage YFull is seeing with current Y-DNA sequencing using the CP086569.2 T2T reference, I thought it would be interesting to realign the VK545 sample to it as well. I downloaded the VK545.final.bam file from ENA and realigned it to the CP086569.2 T2T reference using Samtools to convert the BAM file to FASTQ; then BWA-MEM to realign to the CP086569.2 reference; and then Samtools to remove duplicates. I then used Bcftools mpileup/call to generate a VCF. There were certain intermediate steps in the pipeline that have been left out for simplicity. I did a whole genome realignment and not just a chrY realignment.
[ https://www.ebi.ac.uk/ena/browser/view/ ... show=reads ]

Doing this I got a very interesting result. I had previously written a PHP script to help me analyze VCFs. It is geared towards kits that have upgraded their NGS results, not towards ancient Y-DNA, but it still works well enough for ancient Y-DNA. It looks for all known R1b-DF104 upstream variants in the sample VCF and confirms and then ignores them. It then looks for all R1b-DF104+ known variants and reports those as well as any unknown variants. Here is a table of the results:
TIERPOSIDREFALTQUALFILTERINFO
113391744DF105GA46.4146.CLADE=R1b-DF105;DP=2
121254183DF109AT8.99921.CLADE=R1b-DF105;DP=1
213575770FGC65032CT8.99921.CLADE=R1b-FGC65031;DP=1
420100623MF165555CT7.30814.CLADE=R1b-FTB39547;DP=1
520588609BY18188CT7.30814.CLADE=R1b-BY18120;DP=1
520721167FT119260GA8.99921.CLADE=R1b-FT115566;DP=1
611711058FGC8438GA5.75677.CLADE=R1b-BY18320;DP=1
616714581A11307GA8.99921.CLADE=R1b-A11307;DP=1
710467844FTB91864GA8.99921.CLADE=R1b-BY48495;DP=1
720877246FTC5774GA8.13869.CLADE=R1b-FTC5557;DP=1
87304629BY18132GA8.99921.CLADE=R1b-BY18132;DP=1
87318089M9520CT8.99921.CLADE=R1b-B24;DP=1
88634006Y129700GA8.99921.CLADE=R1b-FT285980;DP=1
817468716FTB12149GA8.99921.CLADE=R1b-BY47745;DP=1
818252929BY20817GA8.99921.CLADE=R1b-FT82182;DP=1
820099847BY132284GA8.99921.CLADE=R1b-BY146806;DP=1
96894312PH432GA3.22451.CLADE=R1b-FT109536;DP=1
915071932BY106599GA8.99921.CLADE=R1b-BY98600;DP=1
103436741FTC32491CT8.99921.CLADE=R1b-FTA14514;DP=1
1015154507FGC62848GA8.13869.CLADE=R1b-FGC62843;DP=1
1015525920Y52254CT3.22451.CLADE=R1b-BY18352;DP=1
118341530BY73963GA5.04598.CLADE=R1b-BY65078;DP=1
1216208403Y26014GA8.99921.CLADE=R1b-Y26014;DP=1
1317421295FT207643CT8.99921.CLADE=R1b-BY137737;DP=1
1327258802FT178070CT8.99921.CLADE=R1b-BY16967;DP=1
1412603472FGC19840GA6.51248.CLADE=R1b-FGC19856;DP=1
10013157239.GA78.4149.QD=26;DP=3;VDB=0.470313;SGB=-0.511536;MQSBZ=0;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,1,2;MQ=60
10015092006.AG104.415.QD=26;DP=4;VDB=0.0320192;SGB=-0.556411;MQSBZ=0;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,1,3;MQ=60
10016408342.GA74.4149.QD=24;DP=3;VDB=0.0900131;SGB=-0.511536;MQSBZ=0;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,2,1;MQ=60
10018136756.GA77.4149.QD=25;DP=3;VDB=0.71601;SGB=-0.511536;MQSBZ=0;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,2,1;MQ=60
10018189304.CT62.4147.QD=20;DP=3;VDB=0.318564;SGB=-0.511536;FS=0;MQ0F=0;AC=1;AN=1;DP4=0,0,3,0;MQ=60
The TIER column represents the subclade depth under R1b-DF104, which is TIER 0. A TIER value of 100 means the variant is unknown. The remaining columns are standard VCF columns.

Since the read depth is mainly 1, it is, of course, somewhat problematic. But DF105 has 2 reads and DF109 has 1 read. The other calls under R1b-DF104 are probably noise or contamination given their low QUAL scores, which even DF109 has. I don't have a tool to be able to examine whether DF108 is negative or a no-call in the realigned BAM file. But the 5 variants at the bottom are relatively strong with read depths of 3 and 4, assuming they are not sequencing artifacts, which seems unlikely.

At this point in time, it is unlikely these 5 variants are upstream of R1b-DF104. So IF DF108 is negative in the VK545 sample, the VK545 sample MAY split the R1b-DF105 phylogenetic node and be in a parallel branch comprised of those 5 variants. Since we have not seen this in current testing, perhaps this is an extinct branch. If so, then this is very interesting.

On the other hand, if DF108 is a no-call and likely positive, then I would assume that the VK545 sample is providing a potentially new direct subclade under R1b-DF105, or possibly within one of the other direct subclades, depending on negative or no-call results for those variant positions. This is still quite interesting.

Regardless, it seems clear that the VK545 sample is NOT R1b-A225+ since there are no confirming reads for it. IDK if anyone else has or is doing realignment of ancient samples, but from my experiment it would appear to be a worthwhile effort. Further, if anyone has the IGV tool or something similar, I will be happy to provide a copy of the realigned BAM file for viewing. Please PM me if you are interested.
Image
User avatar
tamcevoy
Site Admin
Posts: 81
Joined: Fri, 2019-Jul-19 7:47 pm

Re: VK545 Ancient DNA Revisited

Post by tamcevoy »

I know we've discussed this before, but just for everyone, no ZZ87 or Ui Niall markers? And did YFULL ever respond about running this sample against the T2T?
User avatar
Ollam
Site Admin
Posts: 1188
Joined: Wed, 2019-Jun-26 2:47 pm

Re: VK545 Ancient DNA Revisited

Post by Ollam »

Tim,

The variants in the table below are all the known variants I could see under R1b-DF104. They seem to be scattered all over the place, which leads me to suspect they are noise, or perhaps contamination. Keep in mind these are single reads and have a very low QUAL scores. You can use the https://genelach.org/R1b-BY18120/ type of link to look and see where the detected variants are located. I have not checked each one, but given:
  1. FGC65032 is in the R1b-FGC65031 clade (an R1b-DF105 direct subclade).
  2. MF165555 is in the R1b-FTB39547 clade (an R1b-ZS8379 subclade).
  3. BY18188 is in the R1b-BY18120 clade (an R1b-A259 subclade).
it seems these are sample noise or contamination calls. Keep in mind some of the larger TIER depths indicate clades that formed well after the 7-9th century AD era the VK545 sample has been estimated at.

But the 5 unknown variants have relatively good QUAL scores and read depths of either 3 or 4; so they seem solid. And the fact they appear to be currently unknown is extremely intriguing.

I have tried to contact YSEQ, YFull, and Dr. Dan Bradley at TCD about this a few weeks ago, but no one has replied. This is a shame because it could significantly impact what has been initially published about ancient Y-DNA clades.
Image
User avatar
tamcevoy
Site Admin
Posts: 81
Joined: Fri, 2019-Jul-19 7:47 pm

Re: VK545 Ancient DNA Revisited

Post by tamcevoy »

One point to make is that although the ancient sample may not be positive for A223, the normal kits that have been tested here are still ZZ87 and still A223. its important to note that FTDNA still has this sample at DF105
Post Reply