Friday, 5 April 2013

Minimal UCSC Track Hubs

I frequently display results within the Lab using the UCSC genome browser and I'm often asked how I do the "overlapping tracks" feature - specifically I'm usually asked for a minimal configuration that they can then extend.



It's something UCSC has been doing on their own tracks for some time (e.g. the Promotor/Enhancer Histones mark track) but was made available to 3rd parties reasonably recently through their track hub feature.

The first requirement is that you will have to have somewhere you can upload bigWig files to that UCSC can access via HTTP. You can generate bigWig files using UCSC apps bedGraphToBigWig or wigToBigWig.

Note that if you intend on displaying tracks related together they should be appropriately normalised to make comparisons make sense. If your data is already normalised (for example % methylation from Bisulfite sequencing) then this is fine. Otherwise, if you have a bedGraph file representing a pileup (number of reads covering each basepair) then something like the following should work to normalise by library size:

awk -v NUM_FRAGS=x 'BEGIN{OFS="\t"}{print $1,$2,$3,$4/(NUM_FRAGS/1000000)}' file.bedGraph > file.normalised.bedGraph
Where NUM_FRAGS=x is changed to the correct number of fragments in your dataset. At this point you can follow the instructions from UCSC to setup your first track hub. I will briefly summarise them here though.

There's a number of little files required to make it work right.. The file structure I use is something like this:
/hub.txt
/genomes.txt
/hg18/trackDb.txt
/hg18/project1.txt
Here hub.txt is the "root" of the hub and is the URL that you give to UCSC to initialise the hub and load it into the browser. The format is minimal and just describes the human readable description of your hub and the location of a "genomes" file.

/hub.txt
hub HubNameHere
shortLabel Label
longLabel Longer Label Here
genomesFile genomes.txt
email tony@mcbryan.co.uk
The genomes file is simply a list of each genome you want to be able to display data against. Valid options are any of the genomes that UCSC supports on the browser (or your mirror).

Minimally this must include one genome and link to the associated trackDB file which describes the tracks to be loaded.

/genomes.txt
genome hg18
trackDb hg18/trackDb.txt
UCSC uses trackDb files internally to format the display of tracks onto the genome. Track Hubs allow us to use a subset of the trackDb functionality to load our own tracks. UCSC has long had a Custom Tracks feature but this only ever allowed a very small fraction of the features of the trackDb while the Track Hubs allow us much more.
You can, if desired, simply keep all your track definitions in the single trackDb file but this will quickly get messy. Luckily we can import additional files.

/hg18/trackDb.txt
include project1.txt
While in project1.txt we can include only the track definitions that are relevant to that project.

/hg18/project1.txt
track SuperTrack
shortLabel Label
longLabel Long Label
superTrack on none
priority 1

track CompositeTrack
container multiWig
configurable on
shortLabel Label
longLabel Long Label
visibility hide
type bigWig 0 100
autoScale off
aggregate overlay
viewLimits 0:100
windowingFunction mean
superTrack SuperTrack full
showSubtrackColorOnUi on

track SubTrack1
type bigWig
shortLabel Label
longLabel Long Label
parent CompositeTrack
visibility full
bigDataUrl http://a.b.com/path/to/wigs/SubTrack1.bigWig
color 51,102,153

track SubTrack2
type bigWig
shortLabel Label
longLabel Long Label
parent CompositeTrack
visibility full
bigDataUrl http://user:pass@a.b.com/path/to/wigs/SubTrack2.bigWig
color 255,102,0
Therefore project1.txt is where the magic happens. We have a super track (named SuperTrack) which contains all other tracks. This gives us a single drop-down box on the UCSC browser which we can use to turn a whole projects tracks on or off at once.

This super track contains a single composite track which is of type multiWig. The multiWig is capable of overlaying multiple wig tracks on top of each other (using the aggregate option). To make this work we create regular bigWig tracks with their parent set to the composite track.

The bigDataUrl is the important one as that tells UCSC how to actually get to your data (hosted on a web server somewhere). Note that UCSC supports providing the URL along with username and password (as shown above in SubTrack2). This is compatible with usernames and passwords enforced by htaccess on Apache server.

You can fix the viewlimits on wig files (here fixed at 0 to 100 within the composite track. Note that settings on a parent track are inherited by their child tracks unless the child has a conflicting setting.

If you prefer you can have a dynamic range (may be appropriate for ChIP-Seq data etc) which will resize the view to the bounds of your data. If you use dynamic scaling you probably also want to include the alwaysZero attribute on the composite track and set it to "on" otherwise UCSC will scale from your lowest value to your highest value within the window.

You can then view the track hub by entering it on the UCSC Hub Connect page or by creating a link such as:

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg18&hubUrl=http://user:pass@a.b.com/hub/hub.txt

You should be able to use the above as a reference point to building your own Track Hubs. For more information, and details on the other features you can use in Track Hubs, see UCSC's help pages.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.