Supporting data for "Low-coverage whole genome sequencing for a highly selective cohort of severe COVID-19 patients"

De Carmen, Mendoza; Ilduara, Pintos; Manuel, Corpas; Octavio, Corral; Renato, Santos; Vicente, Soriano; Víctor, Moreno-Torres

doi:10.5524/102535

Supporting data for "Low-coverage whole genome sequencing for a highly selective cohort of severe COVID-19 patients"

Éditeur: GigaScience Database

Année de publication: 2024

Type: Dataset

CC0 1.0

DOI: 10.5524/102535 Accès ouvert editor

Résumé

Despite advances in identifying genetic markers associated to severe COVID-19, the full genetic characterisation of the disease remains elusive. This study explores the use of imputation in low-coverage whole genome sequencing for a severe COVID-19 patient cohort. We generated a dataset of 79 imputed variant call format files using the GLIMPSE1 tool, each containing an average of 9.5 million single nucleotide variants. Validation revealed a high imputation accuracy (squared Pearson correlation ≈0.97) across sequencing platforms, showing GLIMPSE1’s ability to confidently impute variants with minor allele frequencies as low as 2% in Spanish ancestry individuals. We conducted a comprehensive analysis of the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms developed to characterise severe COVID-19 symptoms. The methods and findings presented here may be leveraged in future genomic projects, providing vital insights for health challenges like COVID-19