The Biological Questions We Address

In the broadest terms, we seek to understand how information is maintained, and dynamically utilized in living eukaryotic genomes. To this end, the projects in the laboratory seek to understand relationships between DNA packaging (chromatin), the targeting of regulatory proteins such as transcription factors to specific genomic targets, and the mechanism by which these regulatory proteins exert their influence on DNA-dependent enzymes. We also aim to identify and characterize areas of the genome that serve to regulate chromosomal functions, including transcription, DNA replication and repair, recombination, and chromosome segregation.

DNA is an elegant molecule, but carries its information in a language that consists of only four letters. The molecular simplicity of DNA imposes practical limits on the complexity and types of information it can encode. How do complex organisms overcome these limitations? Conceptually, information in living genomes can be visualized as existing in layers, with the information being more diffusely coded in each ascending layer. The primary layer is best represented by protein-coding DNA, which operates according to the relatively inflexible universal genetic code. A second layer encodes regulatory information through the occurrence of millions of degenerate sequence motifs potentially recognized by "sequence specific" DNA-binding proteins such as transcription factors. A third layer of sequence information is very diffusely encoded over hundreds of bases and guides the positioning and occupancy of nucleosomes, the basic units of DNA packaging. The final layer is composed of the nucleosomes themselves. Nucleosomes greatly extend the information-coding capacity of the genome by allowing overlapping, redundant, and even illegitimate information to be safely encoded in DNA sequences. Nucleosomes accomplish this by blocking regulatory protein access to most of the genome, and by dynamically allowing access to relatively small portions of the genome that are utilized specifically in a given cellular environment. We seek to characterize quantitatively how the regulation of genome accessibility occurs and how it is coordinated with the underlying layers of information encoded in DNA.

Yeast, worms, and humans: A strategy for linking basic biology and medicine

To address these questions, we use three biological systems: (1) S. cerevisiae (hereafter "yeast") to address basic molecular mechanisms; (2) C. elegans to test the importance of those mechanisms in a simple multicellular organism; and (3) cell lines and clinical samples to directly interrogate chromatin function in human development and disease. The genomes of these organisms span three orders of magnitude in size (12 Mb, 100 Mb, and 3000 Mb respectively) and a wide range of genome complexity (~50% coding, ~25% coding, and ~1.5% coding respectively). Use of these systems, with C. elegans serving as a "stepping stone" to bridge yeast and human studies, permits us to quickly bring concepts discovered in model systems to medical relevance.

Major Projects in the Lab

1. Using yeast transcription factors to investigate the dynamics of protein-genome interactions

We use the localization of yeast proteins as model systems to investigate in vivo DNA-binding specificity, and how it is regulated under different environmental and developmental conditions. Genome-wide localization of proteins is determined by a method commonly called "ChIP-seq", which stands for Chromatin Immunoprecipitation followed by high throughput sequencing.

Genomic processes, including transcription and the binding of transcriptional regulators, are inherently dynamic. However, most of what we know about the mechanisms underlying transcriptional regulation is derived from static assays like footprinting or Chromatin Immunoprecipitation (ChIP), in which information about dynamics is lost. To overcome this limitation we combine the comprehensiveness and high spatial resolution of genomic, population-based methods like ChIP-seq with the high temporal resolution of single-cell, microscopy-based methods like fluorescence recovery after photobleaching (FRAP).


2. Highly parallel functional characterization of human regulatory elements

One of the most effective means of discovering human transcriptional regulatory elements is through the identification of open chromatin using methods like DNaseI hypersensitivity or FAIRE. In collaboration with the Segal lab in Israel, we are developing a high throughput assay to simultaneously detect and test the function of tens of thousands of human regulatory elements in a single experiment, and test the effect of natural human sequence variation within functional elements.


3. Combining biochemical and genomic methods to study genome and chromatin organization.

Our group work to characterize how DNA is packaged, focusing in particular on the regulation of nucleosome dynamics. We have published results that provide evidence that the basic repeating units of eukaryotic chromatin, nucleosomes, are depleted from active regulatory elements throughout the Saccharomyces cerevisiae genome in vivo. Alterations in the global transcriptional program resulted in an increased nucleosome occupancy at repressed promoters, and a decreased nucleosome occupancy at promoters that became active. Given the conservation of sequence and function among components of both chromatin and the transcriptional machinery, nucleosome depletion at promoters may be a fundamental feature of eukaryotic transcriptional regulation. We are continuing to study the regulation of nucleosome occupancy genome-wide in yeast and C. elegans.

Jeremy Simon, Paul Giresi. Nature Protocols 2012

We are interested in bringing technologies and concepts we develop in model systems to the study of human biology and health. One example is FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements), a simple low-cost method for the isolation and identification of nucleosome-depleted regions of chromatin genomewide. FAIRE was initially discovered in yeast, where we observed that if formaldehyde-crosslinked chromatin was subjected to phenol-chloroform extraction, nucleosome-depleted sequences were recovered in the aqueous phase with much greater efficiency than coding sequences. FAIRE presumably works because covalently crosslinked protein-DNA complexes are retained at the interface of the organic and aqueous solvents, whereas DNA that is not crosslinked (or trapped by crosslinks) escapes into the aqueous phase. Higher-resolution comparison of FAIRE signal to nucleosome mapping data revealed that nearly all yeast genomic regions depleted in histone H3 and H4-Myc chips were enriched by FAIRE. Histone proteins are likely to dominate the crosslinking profile because of their abundant primary amines and close proximity to DNA, both required for crosslinking. We have developed FAIRE as an alternative method for identification of open chromatin sites in human chromatin. FAIRE isolates regulatory regions in human cells that overlap to a large degree with DNaseI hypersensitive regions, but also detect a unique set of loci. Our discovery in of FAIRE in yeast, and its continued development in human cells provides the foundation of projects designed to create a human open chromatin atlas, and our proposal to profile chromatin in human cancer.


4. Establishment of Caenorhabditis elegans as a model metazoan for the study of protein-DNA interactions during development

Yeast is a fabulous system, but we are also interested in studying aspects of chromatin regulation that are required for development. For this purpose, we initiated studies of C. elegans. C. elegans is at the forefront of both large-scale genomic research and gene function discovery. It was the first animal to have a fully mapped and sequenced genome. Genomic approaches including EST projects, SAGE sequencing, an ORFeome library, extensive yeast two hybrid data-sets, microarray profiling, and genome-wide RNAi screens have provided a wealth of information regarding gene structure and function (www.wormbase.org). The versatility of C. elegans for experimental manipulation has led to a large collection of mutant alleles and many well-known discoveries of basic biology (www.wormbook.org). Also unique to worms are the advantages it holds for the study of chromatin factors regulating meiosis and germline development, which are notoriously difficult to study in mammalian systems. As such,we participated in the large effort funded by NHGRI's modENCODE project to identify elements encoded in DNA that control chromatin behavior in C. elegans.

The invariant, asymmetric cell divisions in early C. elegans development provide an ideal model system to study how cell fate determinants are inherited and segregated. We study the chromatin and RNA components in sperm and oocyte, and then combine blastomere-specific isolation with ultra-low input RNA-seq protocols to map RNA inheritance and transcriptional activity after the first cell division. These data allow us to understand how molecular components are inherited, and define the mechanisms required for asymmetric RNA segregation to directly test its biological significance.


Other areas of active research in our lab are focused on the following problems:

5. the link between genetic variation in human regulatory elements and alterations in gene expression;
6. the role of the nuclear envelope and nuclear pores in genome organization and transcriptional regulation; and
7. the characterization and functional implications of somatic nucleotide variation in normal human tissues.