Thursday 18 April 2013

Placing the HIRA Histone Chaperone Complex in the Chromatin Landscape

Our lab is particularly interested in the HIRA chaperone complex made up of the HIRA, ASF1a, UBN1 and CABIN1 proteins. This post will summarise some of our latest work on the complex and some of the results we've found.

Background

DNA in cells is wrapped around a family proteins known as histones - this helps to compact the DNA to make it use up less space. There are four core histones (H2A,H2B,H3 and H4) as well as a linker histone (H1). Within those histones there are also a number variants. One we are interested in is the variant of histone H3 known as H3.3. This variant is different from other members of the H3 family (H3.1 and H3.2) in that it is produced all the time, not only in S-phase when the cells replicate. Our lab has an interest in cellular senescence which has strong links to both ageing and cancer sciences and the mechanisms that regulate histone integration in senescent (non-replicating) cells is crucial to understanding how they behave. Histone H3.3 has been implicated in the dynamics of nucleosome rearrangement and subsequently gene activation and ongoing transcription.

HIRA has previously been shown to be necessary for incorporating H3.3 into chromatin [1,2,3] outside of the replication cycle. A chaperone complex; consisting of HIRA, UBN1 and, as shown by our lab[4], CABIN1; cooperates with ASF1a (a histone-binding protein) to deposit H3.3 into chromatin. HIRA is required for embryo development as well as gene activation.

However, the distribution of the HIRA complex is unknown.

Genome Wide ChIP-Seq

To try to address this we did whole-genome ChIP-sequencing. Briefly, cells are crosslinked with a chemical which basically "sticks" proteins and DNA to each other. The resulting material is then sheared by sonication and incubated with antibodies that can specifically purify the protein of interest. Pieces of DNA that have been in contact with this protein at the time of crosslinking are co-purified in this process. We then sequence the resulting DNA fragments and align them back to a reference genome. We can then look at the genome and determine the frequency of that protein being located at that specific location in the genome.



We sequenced HIRA, UBN1, ASF1a. Comparing with a control input lane we found 8,296 regions with strong HIRA signals, 62,712 UBN1 regions and 64,550 ASF1a regions.



As you can see there's many more ASF1a and UBN1 regions than there are HIRA regions. This is not unexpected for two reasons: (1) HIRA is generally a harder protein to ChIP than ASF1a or UBN1 and therefore the complexity of the HIRA library was reduced compared to UBN1 and ASF1a libraries (2) both ASF1a and UBN1 are likely to be involved in other protein complexes that lack HIRA. At a minimum we've found the strongest HIRA peaks. We were interested in looking at the 1008 HIRA regions which have neither ASF1a or UBN1. Are they entirely because our criteria for our ASF1a and UBN1 regions are too strict?



We took our ASF1a dataset and steadily relaxed the criteria. Even relaxing the criteria we were unable to categorise much more than 93% of the HIRA as containing ASF1a (even when the FDR was essentially 1.0). Inspection reveals these regions have few or no reads for ASF1a or UBN1. This does at least rule out too stringent criteria as the reason for the existence of the HIRA-only regions.

Since we are mostly interested in the HIRA complex (rather than the other functions of ASF1a and UBN1) we concentrated on the HIRA regions. We compared our ChIP-seq results against publicly available ChIP-seq results in the ENCODE database and from Robertson et al.[5] and Vermeulen et al.[6] in the same cell line (HeLa) and hierarchically clustered the result (red = overlap, white = no overlap).


(Click for bigger).

From this clustering we were able to deduce the existence of four clusters (annotated above). The clustering is robust to different clustering linkages (ward, simple, complete etc) as well as addition or removal of factors.

Cluster 1 is enriched in H3K4me1, H3K4me3, H3K27ac, p300 and c-MYC, are FAIRE positive and DNase hypersenitive but are not within CpG islands, gene promotors and show little RNA polII. This suggests these are active enhancers for genes.

Cluster 2 is enriched in RNA polII, CpG islands, gene promotors, H3K4me3 and H3K27ac. This is consistent with these regions being promotors of actively transcribed genes.

Cluster 3 is enriched in H3K4me1 but does not have the same overlap with H3K27ac, p300 or c-MYC. They also lack overlap with promotors, CpG islands or RNA polII. This would likely classify them as inactive, weak or poised enhancers.

Cluster 4 is mostly HIRA-only regions (as compared to clusters 1-3 which have high overlap with both ASF1a and UBN1). It is also FAIRE and DNaseHS negative, not at gene promotors or CpG islands. Another important distinction of cluster 4 (HIRA-only) from clusters 1,2 and 3 (HIRA-complex) is the almost complete absence of H3.3.

As part of the same project we had also sequenced HA-H3.3 (measuring newly incorporated H3.3) and performed FAIRE-sequencing which are both shown in the heatmap above. If we look at these in more detail we can show the same clusters as before but plot the normalised read density around the HIRA peak. Also shown is ENCODEs H2Az dataset.


(Click for bigger).

Here we see results echoing the heatmap above but also showing the relative lack of reads in the 4th cluster (HIRA-only). You can also clearly see the nucleosome-free region at the transcription start site of genes in cluster 2 (gene promotors).

We looked at the gene promotors in more detail and showed that the HIRA complex was present at these nucleosome-free regions just upstream of the transcription start site.


(Click for bigger).

We also showed that the HIRA complex as well as H3.3 and H2Az are correlated with gene expression.


(Click for bigger).

Interactions

We then started looking at the proteins that were overlapping with HIRA. 76% of HIRA complex peaks (all three proteins) overlapped with at least one protein from four families:

* human SWI/SNF (BRG1, INI1, BAF155 and BAF170)
* AP-1 (c-FOS, c-JUN and JUND)
* c-MYC/MAX
* TFAP2 (TFAP2a and TFAP2C)

The strongest overlap (compared to chance) was with the SWI/SNF members which was particularly marked at active enhancers and promotors. As such this made them candidate HIRA complex binding partners.

To investigate this we tested interactions by immunoprecipitation-western blotting. We were able to show IP of endogenous HIRA also coprecipitates other members of the complex (UBN1, ASF1a) as well as transcription factors (c-JUN, c-MYC, GTF2i), SWI/SNF members (BRG1, BRM, INI1) and CTCF but not the negative controls of an abundant chromatin binding protein MCM2 and transcription factor TCF4. We also showed that the reverse was true of BRG1 and INI1 - that is; IP of BRG1 chromatin will also coprecipitate members of the HIRA-complex.


(Click for bigger).

We also performed a proximity ligation assay that scores close proximity of target proteins at the molecular level and showed that HIRA is located in close proximity to BRG1 and INI1 but not with proteins not known to interact with HIRA (DNMT1, MCM2, UACA, ATRX, XRN1, MBD2, LSH and EDC4).


(Click for bigger).

Discussion

We've done the first genome-wide study of human proteins HIRA, ASF1a and UBN1 in the chromatin and identified a number of likely binding partners. The SWI/SNF complex seems especially viable for further study in this respect. In fact just after this paper was accepted a former post-doc in the lab published that BRG1 was required for formation of Senescence-Associated Heterochromatin Foci (SAHF)[7] and he had also shown that HIRA was required SAHF[8] while in the lab. The cells we were working in were HeLa cells so we were unable to observe this effect in our current paper (HeLa cells are an immortal cell line so don't undergo replicative senescence in the usual way) but it seems likely that they may both be involved in the same process regulating SAHF formation.

The full paper can be freely accessed from Cell Reports[9] and the data can be downloaded from GEO GSE45025.

References

[1] Ray-Gallet, D., Quivy, J.P., Scamps, C., Martini, E.M., Lipinski, M., and Almouzni, G. (2002). HIRA is critical for a nucleosome assembly pathway independent of DNA synthesis. Mol. Cell 9, 1091–1100.

[2] Loppin, B., Bonnefoy, E., Anselme, C., Laurenc¸ on, A., Karr, T.L., and Couble, P. (2005). The histone H3.3 chaperone HIRA is essential for chromatin assembly in the male pronucleus. Nature 437, 1386–1390.

[3] Tagami, H., Ray-Gallet, D., Almouzni, G., and Nakatani, Y. (2004). Histone H3.1 and H3.3 complexes mediate nucleosome assembly pathways dependent or independent of DNA synthesis. Cell 116, 51–61.

[4] Rai, T.S., Puri, A., McBryan, T., Hoffman, J., Tang, Y., Pchelintsev, N.A., van Tuyn, J., Marmorstein, R., Schultz, D.C. and Adams, P.D. (2011) Human CABIN1 is a functional member of the human HIRA/UBN1/ASF1a histone H3.3 chaperone complex. Molecular and Cellular Biology.

[5] Robertson, A.G., Bilenky, M., Tam, A., Zhao, Y., Zeng, T., Thiessen, N., Cezard, T., Fejes, A.P., Wederell, E.D., Cullum, R., et al. (2008). Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding. Genome Res. 18, 1906–1917.

[6] Vermeulen, M., Eberl, H.C., Matarese, F., Marks, H., Denissov, S., Butter, F., Lee, K.K., Olsen, J.V., Hyman, A.A., Stunnenberg, H.G., et al. (2010). Quantitative interaction proteomics and genome-wide profiling of epigenetic histone marks and their readers. Cell 142, 967–980.

[7] Tu, Z., Zhuang, X., Yao, Y.G. and Zhang, R. (2013) BRG1 Is Required for Formation of Senescence-Associated Heterochromatin Foci Induced by Oncogenic RAS or BRCA1 Loss. Mol Cell Biol.May;33(9):1819-29.

[8] Zhang, R., Poustovoitov, M.V., Ye, X., Santos, H.A., Chen, W., Daganzo, S.M., Erzberger, J.P., Serebriiskii, I.G., Canutescu, A.A., Dunbrack, R.L., Pehrson, J.R., Berger, J.M., Kaufman, P.D., Adams, P.D. (2005) Formation of MacroH2A-containing senescence-associated heterochromatin foci and senescence driven by ASF1a and HIRA. Dev Cell. 2005 Jan;8(1):19-30.

[9] Pchelintsev, N.A., McBryan, T., Rai, T.S., van Tuyn, J., Ray-Gallet, D., Almouzni, G. and Adams PD. (2013) Placing the HIRA Histone Chaperone Complex in the Chromatin Landscape. Cell Reports.

Friday 5 April 2013

Minimal UCSC Track Hubs

I frequently display results within the Lab using the UCSC genome browser and I'm often asked how I do the "overlapping tracks" feature - specifically I'm usually asked for a minimal configuration that they can then extend.



It's something UCSC has been doing on their own tracks for some time (e.g. the Promotor/Enhancer Histones mark track) but was made available to 3rd parties reasonably recently through their track hub feature.

The first requirement is that you will have to have somewhere you can upload bigWig files to that UCSC can access via HTTP. You can generate bigWig files using UCSC apps bedGraphToBigWig or wigToBigWig.

Note that if you intend on displaying tracks related together they should be appropriately normalised to make comparisons make sense. If your data is already normalised (for example % methylation from Bisulfite sequencing) then this is fine. Otherwise, if you have a bedGraph file representing a pileup (number of reads covering each basepair) then something like the following should work to normalise by library size:

awk -v NUM_FRAGS=x 'BEGIN{OFS="\t"}{print $1,$2,$3,$4/(NUM_FRAGS/1000000)}' file.bedGraph > file.normalised.bedGraph
Where NUM_FRAGS=x is changed to the correct number of fragments in your dataset. At this point you can follow the instructions from UCSC to setup your first track hub. I will briefly summarise them here though.

There's a number of little files required to make it work right.. The file structure I use is something like this:
/hub.txt
/genomes.txt
/hg18/trackDb.txt
/hg18/project1.txt
Here hub.txt is the "root" of the hub and is the URL that you give to UCSC to initialise the hub and load it into the browser. The format is minimal and just describes the human readable description of your hub and the location of a "genomes" file.

/hub.txt
hub HubNameHere
shortLabel Label
longLabel Longer Label Here
genomesFile genomes.txt
email tony@mcbryan.co.uk
The genomes file is simply a list of each genome you want to be able to display data against. Valid options are any of the genomes that UCSC supports on the browser (or your mirror).

Minimally this must include one genome and link to the associated trackDB file which describes the tracks to be loaded.

/genomes.txt
genome hg18
trackDb hg18/trackDb.txt
UCSC uses trackDb files internally to format the display of tracks onto the genome. Track Hubs allow us to use a subset of the trackDb functionality to load our own tracks. UCSC has long had a Custom Tracks feature but this only ever allowed a very small fraction of the features of the trackDb while the Track Hubs allow us much more.
You can, if desired, simply keep all your track definitions in the single trackDb file but this will quickly get messy. Luckily we can import additional files.

/hg18/trackDb.txt
include project1.txt
While in project1.txt we can include only the track definitions that are relevant to that project.

/hg18/project1.txt
track SuperTrack
shortLabel Label
longLabel Long Label
superTrack on none
priority 1

track CompositeTrack
container multiWig
configurable on
shortLabel Label
longLabel Long Label
visibility hide
type bigWig 0 100
autoScale off
aggregate overlay
viewLimits 0:100
windowingFunction mean
superTrack SuperTrack full
showSubtrackColorOnUi on

track SubTrack1
type bigWig
shortLabel Label
longLabel Long Label
parent CompositeTrack
visibility full
bigDataUrl http://a.b.com/path/to/wigs/SubTrack1.bigWig
color 51,102,153

track SubTrack2
type bigWig
shortLabel Label
longLabel Long Label
parent CompositeTrack
visibility full
bigDataUrl http://user:pass@a.b.com/path/to/wigs/SubTrack2.bigWig
color 255,102,0
Therefore project1.txt is where the magic happens. We have a super track (named SuperTrack) which contains all other tracks. This gives us a single drop-down box on the UCSC browser which we can use to turn a whole projects tracks on or off at once.

This super track contains a single composite track which is of type multiWig. The multiWig is capable of overlaying multiple wig tracks on top of each other (using the aggregate option). To make this work we create regular bigWig tracks with their parent set to the composite track.

The bigDataUrl is the important one as that tells UCSC how to actually get to your data (hosted on a web server somewhere). Note that UCSC supports providing the URL along with username and password (as shown above in SubTrack2). This is compatible with usernames and passwords enforced by htaccess on Apache server.

You can fix the viewlimits on wig files (here fixed at 0 to 100 within the composite track. Note that settings on a parent track are inherited by their child tracks unless the child has a conflicting setting.

If you prefer you can have a dynamic range (may be appropriate for ChIP-Seq data etc) which will resize the view to the bounds of your data. If you use dynamic scaling you probably also want to include the alwaysZero attribute on the composite track and set it to "on" otherwise UCSC will scale from your lowest value to your highest value within the window.

You can then view the track hub by entering it on the UCSC Hub Connect page or by creating a link such as:

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hubUrl=http://user:pass@a.b.com/hub/hub.txt

You should be able to use the above as a reference point to building your own Track Hubs. For more information, and details on the other features you can use in Track Hubs, see UCSC's help pages.