Defining transcription modules using large-scale gene expression data

Ihmels,  J.; Bergmann,  S.; Barkai,  N.

doi:10.1093/bioinformatics/bth166

Defining transcription modules using large-scale gene expression data

Details

Request a copy

Serval ID

serval:BIB_DB3A0DD163FB

Type

Article: article from journal or magazin.

Collection

Publications

Institution

UNIL/CHUV

Title

Defining transcription modules using large-scale gene expression data

Journal

Bioinformatics

Author(s)

Ihmels J., Bergmann S., Barkai N.

ISSN

1367-4803 (Print)

Publication state

Published

Issued date

09/2004

Volume

Number

Pages

1993-2003

Notes

Comparative Study
Evaluation Studies
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.
Validation Studies --- Old month value: Sep 1

Abstract

MOTIVATION: Large-scale gene expression data comprising a variety of cellular conditions hold the promise of a global view on the transcription program. While conventional clustering algorithms have been successfully applied to smaller datasets, the utility of many algorithms for the analysis of large-scale data is limited by their inability to capture combinatorial and condition-specific co-regulation. In addition, there is an increasing need to integrate the rapidly accumulating body of other high-throughput biological data with the expression analysis. In a previous work, we introduced the signature algorithm, which overcomes the problems of conventional clustering and allows for intuitive integration of additional biological data. However, this approach is constrained by the comprehensiveness of relevant external data and its lacking ability to capture hierarchical modularity. METHODS: We present a novel method for the analysis of large-scale expression data, which assigns genes into context-dependent and potentially overlapping regulatory units. We introduce the notion of a transcription module as a self-consistent regulatory unit consisting of a set of co-regulated genes as well as the experimental conditions that induce their co-regulation. Self-consistency is defined by a rigorous mathematical criterion. We propose an efficient algorithm to identify such modules, which is based on the iterative application of the signature algorithm. A threshold parameter that determines the resolution of the modular decomposition is introduced. RESULTS: The method is applied systematically to over 1000 expression profiles of the yeast Saccharomyces cerevisiae, and the results are presented using two complementary visualization schemes we developed. The average biological coherence, as measured by the conservation of putative cis-regulatory motifs between four related yeast species, is higher for transcription modules than for clusters identified by other methods applied to the same dataset. Our method is related to singular value decomposition (SVD) and to the pairwise average linkage clustering algorithm. It extends SVD by filtering out noise in the expression data and offering variable resolution to reveal hierarchical organization. It furthermore has the advantage over both methods of capturing overlapping modules in the presence of combinatorial regulation. SUPPLEMENTARY INFORMATION: http://www.weizmann.ac.il/~barkai/modules

Keywords

*Algorithms Cluster Analysis Gene Expression Profiling/*methods Gene Expression Regulation, Fungal/genetics Genes, Regulator/*genetics Proteome/genetics Saccharomyces cerevisiae Proteins/*genetics Sequence Analysis, Protein/*methods Transcription, Genetic/*genetics

OAI-PMH

oai:serval.unil.ch:BIB_DB3A0DD163FB

DOI

10.1093/bioinformatics/bth166

Pubmed

15044247

Web of science

000223827000001

Open Access

Yes

Create date

24/01/2008 14:10