Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534
Deprecated: Implicit conversion from float 217.6 to int loses precision in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 534 Comput+Biol+Med 2021 ; 139 (ä): 104981 Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
Analysis of 329,942 SARS-CoV-2 records retrieved from GISAID database #MMPMID34735950
Zelenova M; Ivanova A; Semyonov S; Gankin Y
Comput Biol Med 2021[Dec]; 139 (ä): 104981 PMID34735950show ga
BACKGROUND: The SARS-CoV-2 virus caused a worldwide pandemic - although none of its predecessors from the coronavirus family ever achieved such a scale. The key to understanding the global success of SARS-CoV-2 is hidden in its genome. MATERIALS AND METHODS: We retrieved data for 329,942 SARS-CoV-2 records uploaded to the GISAID database from the beginning of the pandemic until the January 8, 2021. A Python variant detection script was developed to process the data using pairwise2 from the BioPython library. Sequence alignments were performed for every gene separately (except ORF1ab, which was not studied). Genomes less than 26,000 nucleotides long were excluded from the research. Clustering was performed using HDBScan. RESULTS: Here, we addressed the genetic variability of SARS-CoV-2 using 329,942 samples. The analysis yielded 155 SNPs and deletions in more than 0.3% of the sequences. Clustering results suggested that a proportion of people (2.46%) was infected with a distinct subtype of the B.1.1.7 variant, which contained four to six additional mutations (G28881A, G28882A, G28883capital ES, Cyrillic, A23403G, A28095T, G25437T). Two clusters were formed by mutations in the samples uploaded predominantly by Denmark and Australia (1.48% and 2.51%, respectively). A correlation coefficient matrix detected 160 pairs of mutations (correlation coefficient greater than 0.7). We also addressed the completeness of the GISAID database, patient gender, and age. Finally, we found ORF6 and E to be the most conserved genes (96.15% and 94.66% of the sequences totally match the reference, respectively). Our results indicate multiple areas for further research in both SARS-CoV-2 studies and health science.