GenBank Frequency Information

The current GenBank frequency data in our variant tables is derived from 37,528 human mitochondrial DNA sequences with size greater than 15.4 kbp . The sequences were collected from GenBank on Jun 14, 2017, aligned to rCRS using BLASTn and haplotyped using Haplogrep via the Mitomaster web service. A list of the sequence IDs in the current GenBank set may be downloaded from the Mitobank web page. A spreadsheet of all variants found and their frequencies in the current set of sequences is also available. These GenBank sequences have been pre-loaded into Mitomaster and represent almost all haplogroups known to date. We will continue to update this sequence set on a regular basis. A set of short control region sequences (66,676 sequences as of Jun 14, 2017) has also been collected from GenBank and is included in the variant frequency counts in Mitomap and Mitomaster where indicated.

Caveats: We do not presently tally counts or frequencies of reference alleles (those positions identical to rCRS) or those of ambiguous nucleotides (R, Y, M, K, etc). Indel calls in repetitive regions may not always match those of the original publications due the different manners of indel reporting over the years (e.g., positioning at the beginning or end of a polytract or repeat, forward or backward reading of inserted or deleted bases). The sequences in this GenBank set have not been individually reviewed by Mitomap. Please also be aware that (1) GenBank sequences may not be of equal quality (Yao, et al, 2009); (2) some sequences might be present in GenBank more than once under different IDs; (3) some sequences might be from clones or cell lines; (4) sequence collection is not evenly distributed across the continents; and (5) some of the GenBank sequences are derived from pathology samples or from diseased patients, presenting a somewhat biased sampling of the global mitochondriome.

Lineage distribution of the 37,528 sequences in our current data set:

"African"          "Asian"          "Eurasian"
hg#   %          hg#   %          hg#   %
L31,20630%          M3,73149%          H7,85430%
L01,05826%          D1,87925%          U3,46113%
L293723%          C1,13815%          B2,67410%
L172318%          G2914%          J2,0138%
L4652%          E2663%          T1,8287%
L5361%          Q1672%          K1,5886%
L6120%          Z1572%          F1,0834%
Total4,037100%          Total7,629100%          A1,0194%
