Research Overview

We are a computational genetics and genomics lab. Our main research is on genetic variation, including mechanisms of spontaneous mutagenesis, functional effects of mutations and allelic variants, population genetics and relationship between genotype and phenotype. As part of our research we develop new computational and statistical methods to assist DNA sequencing studies.

Understanding mutations from sequencing data

Mutations are the source of population genetic variation; they fuel evolution and cause disease. Data on de novo germ-line mutations are now available from whole genome sequencing of parent-child trios. Cancer genomics provides data on somatic cancer mutations. We analyze statistical properties of germ-line and somatic cancer mutations alongside epigenomic datasets. We believe that this analysis has a potential to generate biologically relevant hypotheses on leading mechanisms of spontaneous mutations in humans. From an evolutionary viewpoint, it can be informative about the evolution of mutation rate. On the practical side, accurate models of mutation rate will enhance statistical methods of cancer genomics and neuropsychiatric genetics aimed at mapping genes using recurrent de novo mutations.

Some of our findings include the demonstrated association between mutation rate and replication timing; elevated mutation rate in functional regions due to maintenance of hypermutable sites by natural selection; and a unique spectrum of clustered mutations suggesting a specific mechanism generating clustered mutations. For somatic cancer mutations, we demonstrated that the relationship between chromatin accessibility and modification and mutation rate is highly cell-type specific. We also showed that the somatic mutation rate is decreased in regulatory regions marked by accessible chromatin, and linked this observation to the action of nucleotide excision repair.

Functional effect of allelic variants

It is essential to identify, among a myriad of allelic variants, those with the effect on molecular function. For predicting the functional effect of sequence variants in protein coding regions we rely on the comparative sequence analysis and analysis of protein structure. We are continuously developing and maintaining PolyPhen-2 – a computational method for predicting the effect of missense mutations and SNPs. We are interested in dependence of the functional effect of coding variants on genetic background, and are using comparative genomics to identify suppressors of coding mutations. In non-coding regions of the genome, the effects of regulatory variants can also be analyzed using a combination of functional and comparative genomics data. Here, we are interested in using Whole Genome Sequencing to identify regulatory variants of larger effects in humans and animals.

Population genetics

We are interested in population genetics as a lens through which we can study microevolution. Dynamics of allele propagation in populations depends on a number of evolutionary forces. Now, development of theoretical models is enhanced by the availability of massive sequencing datasets.

Our recent results include the demonstration that deleterious alleles are younger than neutral alleles at the same population frequency. We studied the effect of population bottlenecks and expansions on the burden of deleterious mutations under arbitrary dominance coefficient. We are currently interested in the inference of complex natural selection in the form of balancing selection, genetic dominance, epistasis, and pleiotropy from population sequencing data.

Evolution, maintenance and allelic architecture of complex traits

Despite widespread interest in the genetics of complex traits (including common human diseases), basic principles of complex trait genetics are still poorly understood. We attack the problem from three directions. First, we develop theoretical models of evolution and maintenance of complex trait variation under various allelic architectures. Second, we are involved in a large zebrafish screen aiming at identification of key parameters of allelic architecture of complex traits. Third, we work on statistical methods for the analysis of available genomic data in phenotyped human populations. This includes methods for predicting complex phenotypes from genotypes.

Computational and statistical methods for sequencing studies

We develop computational and statistical methods for sequencing studies. VT-test is designed to detect combined association of rare variants with a complex phenotype. SNPTrack has been developed for gene mapping in model organisms. We continue developing new methods including methods that benefit from pedigree collection and functional genomic data. We actively participate in collaborative projects devoted to sequencing of populations with common diseases.

Brigham Genomic Medicine (BGM)

The lab is intertwined with the computational component of Brigham Genomic Medicine (BGM) program. This service aims at discovering genes underlying previously uncharacterized human Mendelian diseases of rare diseases with unknown genetic etiology. We use genomic data of individual pedigrees to identify mutations potentially causing the phenotypes.