Algorithm Performance Comparisons

Next: Bibliography Up: Analyzing Gene Expression Data Previous: Quality Assessment

Algorithm Performance Comparisons

This section contains examples of comparisons between CLICK and other clustering algorithms. Analysis of the comparison summary (table 11.8) shows that CLICK outperforms all the compared algorithms in terms of quality. In addition, CLICK is very fast, allowing clustering of thousands of elements in minutes, and over 100,000 elements in a couple of hours on a regular workstation. Figure 11.17 shows the result of a comparison in which the authors of each algorithm were allowed to run the test on their own. The graph shows a tradeoff between the homogeneity and separation scores; The further the algorithm is from the origin the ``better'' its overall performance. In this case it appears that CLICK results are better than the ``true'' clusters. Clarification from the sources of the original data [22] determined that there may have been errors in it.

Table: A comparison between CLICK and GENECLUSTER [23] on the yeast cell-cycle dataset [2]. Expression levels of 6,218 S. cerevisiae genes, measured at 17 time points over two cell cycles.

Program	#Clusters	Homogeneity		Separation
		H_Ave	H_Min	S_Ave	S_Max
CLICK	30	0.8	-0.19	-0.07	0.65
GENECLUSTER	30	0.74	-0.88	-0.02	0.97

**Figure:** CLICK's clustering of the yeast cell-cycle data [2]. x-axis: time points 0-80, 100-160 at 10-minute intervals. y-axis: normalized expression levels. The solid line in each sub-figure plots the average pattern for that cluster. Error bars display the measured standard deviation. The cluster size is printed above each plot.
$\framebox{ \includegraphics{lec11_fig/CLICK_yeast.eps} }$

**Figure 11.15:** Yeast Cell Cycle: late G1 Cluster (cluster 3 from figure 11.14). The cluster found by CLICK contains 91% of the late G1-peaking genes. In contrast, in GeneCluster 87% are contained in 3 clusters.
$\framebox{ \includegraphics{lec11_fig/CLICK_yeast_G1.eps} }$

Table: A comparison between CLICK and HCS on the blood monocytes cDNA dataset [7]. 2,329 cDNAs purified from peripheral blood monocytes, fingerprinted with 139 oligos. Correct clustering known from back hybridization with long oligos.

Program	#Clusters	#Singletons	Minkowski	Jaccard	Time(min)
CLICK	31	46	0.57	0.7	0.8
HCS	16	206	0.71	0.55	43

Table: A comparison between CLICK and K-Means [8] on the sea urchin cDNA dataset. 20,275 cDNAs purified from sea urchin eggs, and fingerprinted with 217 oligos. Correct clustering of 1,811 cDNAs known from back hybridizations.

Program	#Clusters	#Singletons	Minkowski	Jaccard	Time(min)
CLICK	2,952	1,295	0.59	0.69	32.5
K-Means	3,486	2,473	0.79	0.4	-

Table: A comparison between CLICK and Hierarchical [5] clustering on the dataset of reponse of human fibroblasts to serum [9]. Human fibroblast cells starved for 48 hours, then stimulated by serum. Expression levels of 8,613 genes measured at 13 time points.

Program	#Clusters	Homogeneity		Separation
		H_Ave	H_Min	S_Ave	S_Max
CLICK	10	0.88	0.13	-0.34	0.65
Hierarchical	10	0.87	-0.75	-0.13	0.9

**Figure:** CLICK's clustering of the fibroblasts serum response data [9]. x-axis: 1-12: synchronized time-points. 13: unsynchronized point. y-axis: normalized expression levels. The solid line in each sub-figure plots the average pattern for that cluster. Error bars display the measured standard deviation. The cluster size is printed above each plot.
$\framebox{ \includegraphics{lec11_fig/CLICK_fibro.eps} }$

Table: A comparison between CLICK and SYSTERS on a dataset of 117,835 proteins [11]. Measures based on similarity when no correct solution is known: For a fixed threshold t, homogeneity is the fraction of mates with similarity above t, and separation is the fraction of non-mates with similarity above t.

Program	#Clusters	#Singletons	Homogeneity	Separation	Time(min)
CLICK	9,429	17,119	0.24	0.03	126.3
SYSTERS	10,891	28,300	0.14	0.03	-

Table 11.8: A Summary of the time performance of CLICK on the above mentioned datasets. CLICK was executed on an SGI ORIGIN200 machine utilizing one IP27 processor. The time does not include preprocessing time. The ``Improvement'' column describes whether the solution of the CLICK algorithm was better than the compared algorithm.

Elements	Problem	Compared to	Improvement	Time(min)
517	Gene Expression Fibroblasts	Cluster [5]	Yes	0.5
826	Gene Expression Yeast cell cycle	GeneCluster [23]	Yes	0.2
2,329	cDNA OFP Blood Monocytes	HCS [7]	Yes	0.8
20,275	cDNA OFP Sea urchin eggs	K-Means [8]	Yes	32.5
72,623	Protein similarity	ProtoMap [24]	Minor	53
117,835	Protein similarity	SYSTERS [11]	Yes	126.3

**Figure:** Comparison of clustering algorithms using homogeneity and separation criteria. The data consisted of 698 genes, 71 conditions [22]. Each algorithm was run by its authors in a ``blind'' test.
$\framebox{ \includegraphics{lec11_fig/alg_compare.eps} }$

Next: Bibliography Up: Analyzing Gene Expression Data Previous: Quality Assessment

Peer Itsik
2001-01-31