Explore cancer splicing

Last update: 30-Jan-2017
WARNING: The app is currently undergoing some updates, so might not work optimally

Background: Large-scale consortia  such as TCGA have generated a wealth of data regarding the molecular profiles of cancer patients, and several excellent tools are available for exploring e.g. DNA mutations or gene expression levels. However, changes in RNA splicing can be more difficult to explore and navigate, and gaining insight into the potential impact of RNA mis-splicing on disease mechanisms remains challenging. I am therefore currently developing a point-and-click interface, to facilitate access to pre-processed RNA splicing data without any bioinformatic knowledge required. The aim is to create a stand-alone web app, however, this is currently a beta version that needs to be run on the user’s local machine. For instructions, see the ‘prerequisites’ and steps 1-3 below.

Purpose: Explore (mis)splicing of RNA across multiple cancer types directly within a web browser. This can either be done at the pan-cancer level; genome-wide within a particular cancer type; taking a gene-centric approach; or by comparing samples that are wild-type vs mutated in one or more genes. Figures are generated on-the-fly based on inputs selected by the user. Results are available either in summarised form, or for individual patients. The aim is  to answer questions such as for example:

  • Which genes are most frequently aberrantly spliced in lung cancer? Or pan-cancer?
  • In which type of cancer is PTEN most frequently mis-spliced?
  • Which exons are skipped within CDK10 in colorectal cancer?
  • Is there more or less mis-splicing in kidney cancer compared to cancers from other tissues of origin?
  • What category of alternative splicing are most affected pan-cancer?
  • How much does splicing vary across patients within a given cancer type?
  • How do tumour and normal samples cluster according to splicing?
  • How does splicing differ across samples w/wo mutations in p53?

All the alternative splicing results presented within the program are based on analysis of RNA-sequencing data from patient-matched tumour/normal samples from The Cancer Genome Atlas, as initially published here.

Prerequisites: A version of R new enough for all the packages mentioned below to be installed. E.g. version 3.3.1 or higher, running in 64-bit mode. This has been tested on both mac and PC (desktops and laptops) using packages from BioConductor release 3.4. However, older version of R are highly unlikely to work, and slower machines may cause long stalling, or not load the data entirely!! Running the commands shown below from within a standard R window will open the app in your default web browser. Running them from within Rstudio will open a separate Rstudio window, with the same functionality as the web app. On a mac I find the two roughly equal, although on a PC the figures and aesthetics in the Rstudio window appear slightly less clear (in my admittedly very limited PC experience).

Step 1: If you don’t already have it, install R version 3.3.1 or higher. This can be installed as either the standard R version, or as Rstudio with an expanded user interface. I prefer the standard version, but both are acceptable.

Step 2: Before running the program for the first time, install the required R packages by copy-pasting the following into the R window:

biocLite(c("dplyr", "tidyr", "ggvis", "ggplot2", "shiny", "S4Vectors", "cluster", "d3heatmap", "GenomicRanges"))

Step 3: To actually run the program, copy-paste the two lines below to open up the app in your default web browser (or a separate window within Rstudio). To exit the program, press ‘Escape’ or Control-C from within your R terminal.


Note that depending on your computer and internet connection, it will take several minutes for the program to download all the data, load it into R and start running. A message will appear in the R window once the program is ready to run. So be patient! Or just go get a cup of coffee while you wait – you know you want to.

For running the program offline, specify ‘destdir’ in runUrl (see the ?runUrl documentation in R for more info). Pro: it can be run offline, and it’ll load a bit faster next time it’s opened, if run from destdir instead of the http location. Con: it won’t include any potential updates or fixes implemented since it was downloaded.

Disclaimer:   This is currently a pre-pre-beta version of a program that may be made available for running online via a web server at some point. In the meantime, it can be run via R on your local desktop, as outlined above. It may be somewhat clunky though. There’s most likely bugs nesting happily in there, so the code may be modified without notice. Or expanded upon, to include new functionalities.