CP086569.2 And The GRCh38 chrY_KI270740v1_random Contig

This forum is for general discussion about the Dál Cuinn.
Post Reply
User avatar
Site Admin
Posts: 1270
Joined: Wed, 2019-Jun-26 2:47 pm

CP086569.2 And The GRCh38 chrY_KI270740v1_random Contig

Post by Ollam »

While doing database scrubbing, I found two unusual cases. In the GRCh38 reference, there was an additional chrY contig added to the main chrY contig called the chrY_KI270740v1_random contig. This added several thousand new positions on chrY; but these new positions have their own coordinates relative to the chrY_KI270740v1_random contig and not to the main chrY contig. The Big Tree added several of these chrY_KI270740v1_random contig positions in its BAM file analyses. During the scrubbing I discovered two of these chrY_KI270740v1_random contig variants mapped to the same positions as two main chrY contig variants in the T2T CP086569.2 (chrY) reference.

R1b-A1742 has this phylogenetic equivalent variant:
BY35985 (chrY: 20083013-G-C)
and its R1b-A12551 subclade has this phylogenetic equivalent variant:
TBT10365 (chrY_KI270740v1_random: 19766-C-G)
But when mapped to the CP086569.2 reference using the UCSC Liftover tool:
BY35985 (CP086569.2: 20980089-C-G)
TBT10365 (CP086569.2: 20980089-C-G)

R1b-BY69603 has these two phylogenetic equivalent variants:
BY214017 (chrY: 20082979-C-T)
TBT10388 (chrY_KI270740v1_random: 19800-G-A)
But when mapped to the CP086569.2 reference using the UCSC Liftover tool:
BY214017 (CP086569.2: 20980123-G-A)
TBT10388 (CP086569.2: 20980123-G-A)

I am not sure what to do with these two cases. Currently, I am leaving the apparent duplicates in the database. But it makes me wonder how many more of the GRCh38 chrY_KI270740v1_random contig positions are in fact duplicates of main chrY contig positions.

The main takeaway of this is the obvious deficiency of the GRCh38 reference and the urgent need to have an R1b based T2T reference instead of the current NA24385 based T2T reference, who is an Ashkenazi male belonging to the J1-M267 Y haplogroup. Unfortunately, without specific funding for an R1b based reference, one is not likely to be forthcoming in the near future.
Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level.
https://www.nature.com/articles/s41422- ... 23100cbc13
The GRCh38 reference is primarily based on R1b men and as the above quote indicates, efforts won't be focused on R1b any time soon. To be VERY clear, this is not an ethnic rant, but a lament about the scientific need for an R1b based T2T reference due to the significant differences in the Y chromosome structure among the various major Y haplogroups and the fact there is a HUGE amount of R1b data that has been collected and desperately needs the benefit of an R1b based T2T reference.
Post Reply