Although the human genome has been almost entirely sequenced, much
still remains to be understood about how the genes in the human
genome are regulated. Even general themes regarding the locations of
DNA regulatory elements, such as where transcription factor (TF)
binding sites are generally found, are still unknown. The
interactions between TFs and their DNA binding sites are an integral
part of the regulatory networks within cells. They control critical
steps in development and responses to environmental stresses, and
their dysfunction can contribute to the progression of various
diseases. However, the sequence specificities and regulatory
functions of most of the ~1850 described and predicted human TFs are
currently unknown.
Using a DNA microarray technology that we developed, we have analyzed
the DNA binding specificities of not only individual proteins, but
also selected pools of proteins, as well as entire libraries of
proteins. We are currently using a version of this technology that
permits measurement of the direct binding of transcription factor DNA
binding domains to microarrays spotted with coding and intergenic
region DNAs. This allows us to identify genomic locations of
transcription factors' DNA binding sites, and thus what genes they
might regulate. These data can also be used in a predictive manner to
search related genomes for locations of these regulatory sequence
elements. Such data provide a highly informative connection between
mRNA expression analysis, proteomics, and structural genomics.
A significant challenge imposed by the genomes of higher eukaryotes
is that regulatory DNA elements can be found far upstream of
promoters, as well as in introns or downstream of the genes they
regulate. To this effect, we have developed strategies that employ
comparative genomics methods for computational analyses and
predictions of transcription factor binding sites in the mouse and
human genomes. Predictions regarding the interactions between TFs and
their DNA binding sites will be validated using protein binding
microarray experiments, in which TFs are bound directly to DNA
microarrays, as well as with chromatin immunoprecipitation microarray
experiments, in which a microarray readout indicates what regions of
the genome are bound in vivo by a given TF under particular culture
conditions. In addition, analyses of existing chromatin
immunoprecipitation microarray data are being performed to identify
what sequence context features might be contributing to the
regulation of which TF binding sites are bound under particular
conditions. The validation of predicted sites will serve to more
accurately annotate the respective TFs and to provide data on the
combinatorial interactions of various TFs. The results of these
experiments and analyses will be important for a better understanding
of the locations and organization of regulatory DNA elements in
mammalian genomes, and will permit the development of more accurate
algorithms for the prediction of such elements in the human genome.
Future projects may involve combining DNA-protein interaction data
with data from other genomic and proteomic approaches for deciphering
the regulatory networks within cells. Such projects might examine
other types of regulation, such as DNA methylation, RNA secondary
structure, chromatin structure, and the role of small molecule
cofactors and modifications of proteins.