CLUSTER ANALYSIS OF DNA CHIP DATA

Eytan Domany

Dept. of Physics of Complex Systems
Weizmann Institute of Science

Abstract
DNA chips allow simultaneous measurement of the level at which thousands of
genes are expressed. A typical experiment uses 1- - 100 chips, each devoted
to one sample - such as material extracted from a particular tumor, or cells
from a culture, probed at one particular time after some manipulation. Hence
the results of such an experiment contain several hundred thousand numbers,
that come in the form of a table, of several thousand rows (one for each gene)
and 50 - 100 columns (one for each sample). We developed a clustering
algorithm,
Coupled Two-Way Clustering (CTWC) [1], to mine such data, and applied it to
study data obtained from colon cancer, leukemia, glioblastoma and breast
cancer
patients, and to several other problem areas.

I will introduce clustering and review briefly a few clustering methodologies.
In particular, I will describe SuperParamagnetic Clustering (SPC), an
algorithm
that we invented several years ago, and demonstrate its effectiveness on two
experiments on gene expression; looking at genes during the yeast cell cycle
and identifying the primary targets of P53.

Next I will demonstrate the need for CTWC and explain how it is implemented,
with SPC serving as its ^Óclustering engine^Ô.

Finally, I will demonstrate the effectiveness of CTWC on several data sets:
colon cancer, leukemia, glioblastoma , breast cancer and ^Óantigen chips^Ô.
 

[1]  G. Getz, E. Levine, and E. Domany,  PNAS 97, 12079 (2000)