• Large scale full length viral sequencing to determine viral evolution and spread across the community – As the virus is currently estimated to acquire approximately one new RNA mutation every 2 weeks, or every 2-3 transmissions, one can use the pattern of sequence variation to determine how far back any two infections can be traced to a single host and, with deep enough sampling, estimate where that host was and how the virus has moved through human populations. Such data could also be used to estimate important factors for epidemiological models such as how many introductions of the virus have occurred in a specific region and the number of unique clusters of transmission. These data will also provide insights to the number of undetected cases that are currently transmitting in the population.
  • Whole genome germline sequencing and immune repertoire sequencing of affected individuals, focused on extreme phenotypes to examine host factors and immune responses – Although the age and co-morbidity cofactors for COVID-19 have revealed high mortality rates in individuals of older age and those with pre-existing immune, cardiovascular or lung diseases, a striking observation is respiratory failure in some otherwise healthy young individuals. A known risk factor appears to be heavy exposure to viral load especially among frontline healthcare workers. It is likely germline variation will explain some aspects of extreme responses to SARS-CoV-2, which may in turn guide risk mitigation, vaccine development and implementation, and other strategies. Possibilities include DNA sequence polymorphisms that influence protein function and/or outlier gene expression in ACE2, TMPRSS2, and other proteins involved in viral entry, genes controlling surfactant alveolar cell development and function, polymorphisms in HLA, innate, or cellular immune responses, polymorphisms affecting cardiovascular function, intracellular RNA processing and others. Host-pathogen interaction with the genotype of the viral load may also be important, thus knowledge of both the viral sequences present, as well as host factors will be crucial. The information obtained could not only be impactful in the current pandemic but could also inform future pandemics.
  • Single cell sequencing to examine tissue responses and provide a genome variation of expression context for viral responses. 
  • Establishment of a data commons for the research network so that full availability of data to investigators can be provided.