In metagenomics, the aim is to understand the composition and operation of complex microbial consortia in environmental samples through sequencing and analysis of their DNA. Similarly, metatranscriptomics and metaproteomics target the RNA and proteins obtained from such samples. Technological advances in next-generation sequencing methods are fueling a rapid increase in the number and scope of environmental sequencing projects. In consequence, there is a dramatic increase in the volume of sequence data to be analyzed. The first three basic computational tasks for such data are taxonomic analysis, functional analysis and comparative analysis. These are also known as the “who is out there?”, “what are they doing?” and “how do they compare?” questions. They pose an immense conceptual and computational challenge, and there is a great need for new bioinformatics tools and methods to address them. MEtaGenome Analyzer (MEGAN), a stand-alone analysis tool for such ‘omics’, addresses these questions. Newly released MEGAN5 has some significant updates from the previous versions for the ease of analyses.
As an example of bioinformatics approach, the data analyses based on Singapore waterways will be described.
To determine the effects of land-use (residential vs. industrial) and rain perturbation (pre vs. post rain) on the sediments of Singapore’s urban waterways, a set of 2 industrial and 2 residential locations was selected. Sediment samples were collected in triplicate from each site during two independent rain events. Each site was accessed about 1-2 hours before rain (pre) and 3-4 hours after rain (post).
Total community DNA was extracted and sequenced from all 48 samples, using Illumina HiSeq2000 platform.
The analysis of this large dataset using a combination of split networks and multivariate analysis yielded important insights into the effects of land-use, their associated environmental drivers and, to a lesser extent, rain perturbations in causing shifts in the abundance of specific microbial populations. Bioinformatics approaches to analyze this data to understand sedimentary ecosystem functions will also be discussed.