Abstract
DNA chips allow simultaneous measurement of the level at which thousands
of
genes are expressed. A typical experiment uses 1- - 100 chips, each
devoted
to one sample - such as material extracted from a particular tumor,
or cells
from a culture, probed at one particular time after some manipulation.
Hence
the results of such an experiment contain several hundred thousand
numbers,
that come in the form of a table, of several thousand rows (one for
each gene)
and 50 - 100 columns (one for each sample). We developed a clustering
algorithm,
Coupled Two-Way Clustering (CTWC) [1], to mine such data, and applied
it to
study data obtained from colon cancer, leukemia, glioblastoma and breast
cancer
patients, and to several other problem areas.
I will introduce clustering and review briefly a few clustering methodologies.
In particular, I will describe SuperParamagnetic Clustering (SPC),
an
algorithm
that we invented several years ago, and demonstrate its effectiveness
on two
experiments on gene expression; looking at genes during the yeast cell
cycle
and identifying the primary targets of P53.
Next I will demonstrate the need for CTWC and explain how it is implemented,
with SPC serving as its ^Óclustering engine^Ô.
Finally, I will demonstrate the effectiveness of CTWC on several data
sets:
colon cancer, leukemia, glioblastoma , breast cancer and ^Óantigen
chips^Ô.
[1] G. Getz, E. Levine, and E. Domany, PNAS 97, 12079 (2000)