Recent SBI Projects
Genomics: DNA Data Analysis Workflows
We have access to high performance computers, including the 256-processor cluster 'McLaren' (NSW supercomputer) at the AC3 data centre and the 12,000-processor Vayu national supercomputer.
We have built data analysis workflows for the processing of:
- Short-read sequences from Illumina Genome Analyser, with a focus on bacterial genome assembly.
Installed packages include Edena, Velvet, RAST, BLAST, Blast2GO, WEGO, Integrated Genome Viewer (IGV) and Artemis. - Transcriptomic data, with a focus on mammalian transcripts.
Installed packages include Bowtie, TopHat, Cufflinks, Cuffcompare, CuffDiff, IGV, USC Genome Browser and the DAVID tools.
Computionally-intensive steps of these workflows have been installed on both Mclaren and Vayu. A full listing of packages and tools installed on these clusters can be found here.
These are available for use by any group at a NSW-based university (for Mclaren) or groups in Australia who are partners of the National Computational Infrastructure (NCI). Other parts of the workflow are executed on local workstations (e.g. viewers).
Genomics: Sequencing and proteomic validation of the genome of C. concisusDeshpande NP, Kaakoush NO, Mitchell H, Janitz K, Raftery MJ, Li SS, Wilkins MR (2011). Sequencing and validation of the Campylobacter concisus genome reveals extensive intra-species diversity. PLoS ONE, 6: e22170. [abstract]
Campylobacter concisus has been reported to be an emerging pathogen of the human gastrointestinal tract. However, contrasting reports regarding the role of C. concisus in different disease states has raised doubts as to its pathogenic potential. One reason for these observations may relate to the previously reported genetic variations between strains of C. concisus.
We have sequenced the genome of a C. concisus strain isolated from an intestinal biopsy of a child with Crohn's disease (UNSWCD). 56 million paired-end reads (102 bp) were generated by an Illumina/Solexa GII sequencer and assembled into a genome of size 1.8 Mb (with 123 contigs) using the de novo assembler Velvet 1.0.09.
A significant difference was observed across the gene content of the UNSWCD genome when compared with its larger C.concisus reference genome BAA-1457 (size 2.1 Mb). However, all essential genes as defined in Mycoplasma genitalium and Helicobacter pylori were found to be present, thus validating the assembled genome.
While 1576 genes were conserved across UNSWCD and BAA-1457, 138 genes from UNSWCD and 255 from BAA-1457 were found to be unique when compared against the other. To further validate the genome, shotgun proteomics performed on an Orbitrap tandem mass spectrometer identified 1,369 (217 hypothetical) proteins and 1321 (220 hypothetical) proteins in the UNSWCD and BAA-1457 proteomes, respectively. Synteny maps for the 255 genes specific to C. concisus BAA-1457 generated using neighborhood associations from STRING and the GEOMI visualisation platform identified 7 functionally associated gene clusters that are absent in the UNSWCD genome. This systems level analysis of the UNSWCD genome will facilitate further probing of the biological aspects of the strain specific features of C. concisus and allow future clarification of their role in specific disease outcomes.
Transcriptomics: Next-generation RNA-seq analysis of the normal and Alzheimers' brainTwine NA, Janitz K, Wilkins MR, Janitz M (2011). Whole transcriptome sequencing reveals gene expression and splicing differences in brain regions affected by Alzheimer's disease. PLoS ONE, 6: e16266. [abstract]
Recent studies strongly indicate that aberrations in the control of gene expression might contribute to the initiation and progression of Alzheimer's disease. In particular, alternative splicing has been suggested to play a role in spontaneous cases of Alzheimer's disease.
Previous transcriptome profiling of Alzheimer's disease models and patient samples using microarrays delivered conflicting results. Next-generation sequencing of the whole transcriptome (RNA-Seq) provides a powerful alternative to microarray-based methods, both in terms of quantitative and qualitative analysis of gene expression.
We used Illumina RNA-Seq analysis to survey transcriptome profiles from total brain, frontal and temporal lobe of healthy and AD post-mortem tissue. We quantified gene expression levels, splicing isoforms and alternative transcript start sites. Gene Ontology term enrichment analysis revealed an overrepresentation of genes associated with a neuron's cytological structure and synapse function in Alzheimer's disease brain samples.
Analysis of the temporal lobe with the Cufflinks tool revealed that transcriptional isoforms of the apolipoprotein E gene, APOE-001, -002 and -005, are under the control of different promoters in normal and Alzheimer's disease brain tissue. We also observed differing expression levels of APOE-001 and -002 splice variants in the Alzheimer's disease temporal lobe.
For the first time, this study provides transcriptome analysis from distinct regions of the AD brain using RNA-Seq next-generation sequencing technology. Furthermore, our results indicate that alternative splicing and promoter usage of the APOE gene in Alzheimer's disease brain tissue might reflect the progression of neurodegeneration.
Proteomics / Systems Biology: Proteomic classification of Stage C Breast Cancer patientsA cohort of stage C breast cancer patients, of known outcome after chemotherapeutics, had tumour tissue analysed by proteomics. Multiplex iTRAQ analysis of tumours was undertaken, and protein expression in each tumour compared to a pooled control.
Comprehensive data filtering and analysis was undertaken using univariate, multivariate and artificial intelligence techniques. Functional analysis was undertaken with the Reactome, NetMap analytics and self-organising maps coupled to the Gene Ontology. Unique protein signatures were detected in association with each patient group.
Systems Biology: The dynamics of protein interaction networks - 4D visualisation of the interactomeProtein-protein interaction networks are typically built with interactions collated from many experiments. These networks are thus composite and show all interactions that are currently known to occur in a cell. However, these representations are static and ignore the constant changes in protein-protein interactions.
We developed software for the generation and analysis of dynamic, four-dimensional (4D) protein interaction networks. Time course-derived abundance data was mapped onto three-dimensional networks to generate network movies. These networks can be navigated, manipulated and queried in real time. Two types of dynamic networks can be generated: a 4D network which maps expression data onto protein nodes, and one that employs 'real-time rendering' by which protein nodes and their interactions appear and disappear in association with temporal changes in expression data.
We illustrate the utility of this software by the analysis of date hub interactions during the yeast cell cycle and the cross-comparison of dynamic networks.
Software and movies of the 4D networks are available here.
Wet Lab: The methylproteome and cellular methylation networkProtein methylation occurs predominantly on arginine and lysine residues. It exists in multiple forms: mono-, di- and tri-methylation.
We recently completed the first proteome-scale analysis for protein methylation [abstract]. This showed that protein methylation is more widespread than previously thought.
We are now constructing the cellular methylation network, through a combination of single, double and triple gene knockouts of methyltransferases and the screening of substrate proteins with antibody- and mass-spectrometry based analysis. The generation of SRM and/or MRM assays for methylation will allow us to monitor the dynamics of protein methylation in the cell.
Wet Lab: Protein methylation as a regulator of protein-protein interactionsThe methylation of proteins affects their hydrophobicity. This can affect protein structure, and is known in some cases to affect or control protein-protein interactions.
We are investigating the manner and degree to which methylation regulates protein-protein interations in the cell. A combination of genetic and proteomic techniques, including knockout analysis, two-hybrid approaches and the analysis of protein complexes with native gel electrophoresis, are being used in this fundamental investigation.
Click here for more information about this project.
Professor Marc Wilkins
Having defined the proteome in 1994 and coined the term, my research has continued to ask large-scale questions about protein function.
We are currently examining the role of protein-protein interactions in the proteome, which can be called interactome research, and are exploring the following:
- What does the interactome look like? How can it best be visualised?
- What protein complexes exist? How can they be separated?
- Are protein complexes made of core, attachment and module proteins?
- What is the role of post-translational modifications in the interactome?
- How, when and why do proteins interact with each other? What are the dynamics of this process?
- How common is a loss or gain of protein-protein interaction in association with human disease?
We are exploring these questions using a combination of proteomic and bioinformatic techniques, analysing samples from numerous biological systems. We have access to FT-ICR, Q-star, Q-TOF and MALDI-TOF mass spectrometry at the on-site Bioanalytical Mass Spectrometry Facility.
Dr Melissa Erce
Since the development of the yeast two hybrid system over 20 years ago, it has been successfully used to detect thousands of protein-protein interactions. However, its use is limited to proteins which are not post-translationally modified.
It is now known that many proteins need post-translational modifications to interact with their target proteins. To fill this gap, the tethered catalysis two-hybrid system was developed, which is based on the classical yeast two-hybrid system. However, there are clear limitations to this system, especially where endogenous modifying enzymes can bring about false positive interactions.
Melissa’s research is focused on the development of a novel bacterial two-hybrid system for the detection of modification-dependent protein-protein interactions. This system has several attractive features. It can support many types of interacting proteins and many types of post-translational modifications. The use of plasmids makes the system genetically tractable and a number of controls such as mutational analyses of the enzyme active sites or modification sites can be incorporated with ease. The post-translationally modified proteins can also be overproduced in Escherichia coli. These can then be directly used to measure the degree of post-translational modifications using mass spectrometry and to map precise modification sites.
Dr David Fung
David moved from molecular genetics to bioinformatics in 2000 and after finishing his PhD in 2010, he shifted to systems biology. David believes that the underpinning of systems biology is complex systems theory. Since all complex systems can be modeled as a network, he specialises in modelling expression data as a network. Between 2005-2008, he constructed a network model of hepatocellular carcinoma (HCC) which integrated gene co-expression, protein-protein interaction, microRNA-gene interaction, gene-disease relationships, gene-Gene Ontology (GO) relationships, as well as gene-cell quiescent condition datasets. The model is novel in two ways. The first is the integration of both gene-GO and gene-disease relationships, which gives the model a richer context. The second is the integration of gene-cell quiescent condition relationships. His rationale was to use the gene expression profile of quiescent human fibroblasts as a reference cell state. The first paper from this work was published in Cancer Informatics [abstract]. Currently, he is looking at the systems biology of myocardial infarction in collaboration with Dr Ruby Lin.
- How do different network constructs derived from expression, interaction, and ontology data assist hypothesis deduction? Which network models are good fits for different biological problems?
- How many molecular interaction types and ontology types can we integrate into a network before it loses inference power?
- What is the biological meaning of network topology (including network statistics and topology measures) in normal and disease cells?


