next up previous
Next: Bibliography Up: Analyzing Gene Expression Data Previous: Quality Assessment

Algorithm Performance Comparisons

This section contains examples of comparisons between CLICK and other clustering algorithms. Analysis of the comparison summary (table 11.8) shows that CLICK outperforms all the compared algorithms in terms of quality. In addition, CLICK is very fast, allowing clustering of thousands of elements in minutes, and over 100,000 elements in a couple of hours on a regular workstation. Figure 11.17 shows the result of a comparison in which the authors of each algorithm were allowed to run the test on their own. The graph shows a tradeoff between the homogeneity and separation scores; The further the algorithm is from the origin the ``better'' its overall performance. In this case it appears that CLICK results are better than the ``true'' clusters. Clarification from the sources of the original data [22] determined that there may have been errors in it.
 
Table: A comparison between CLICK and GENECLUSTER [23] on the yeast cell-cycle dataset [2]. Expression levels of 6,218 S. cerevisiae genes, measured at 17 time points over two cell cycles.
Program #Clusters Homogeneity Separation
    HAve HMin SAve SMax
CLICK 30 0.8 -0.19 -0.07 0.65
GENECLUSTER 30 0.74 -0.88 -0.02 0.97
 


  
Figure: CLICK's clustering of the yeast cell-cycle data [2]. x-axis: time points 0-80, 100-160 at 10-minute intervals. y-axis: normalized expression levels. The solid line in each sub-figure plots the average pattern for that cluster. Error bars display the measured standard deviation. The cluster size is printed above each plot.
\framebox{
\includegraphics{lec11_fig/CLICK_yeast.eps}
}


  
Figure 11.15: Yeast Cell Cycle: late G1 Cluster (cluster 3 from figure 11.14). The cluster found by CLICK contains 91% of the late G1-peaking genes. In contrast, in GeneCluster 87% are contained in 3 clusters.
\framebox{
\includegraphics{lec11_fig/CLICK_yeast_G1.eps}
}


 
Table: A comparison between CLICK and HCS on the blood monocytes cDNA dataset [7]. 2,329 cDNAs purified from peripheral blood monocytes, fingerprinted with 139 oligos. Correct clustering known from back hybridization with long oligos.
Program #Clusters #Singletons Minkowski Jaccard Time(min)
CLICK 31 46 0.57 0.7 0.8
HCS 16 206 0.71 0.55 43
 


 
Table: A comparison between CLICK and K-Means [8] on the sea urchin cDNA dataset. 20,275 cDNAs purified from sea urchin eggs, and fingerprinted with 217 oligos. Correct clustering of 1,811 cDNAs known from back hybridizations.
Program #Clusters #Singletons Minkowski Jaccard Time(min)
CLICK 2,952 1,295 0.59 0.69 32.5
K-Means 3,486 2,473 0.79 0.4 -
 


 
Table: A comparison between CLICK and Hierarchical [5] clustering on the dataset of reponse of human fibroblasts to serum [9]. Human fibroblast cells starved for 48 hours, then stimulated by serum. Expression levels of 8,613 genes measured at 13 time points.
Program #Clusters Homogeneity Separation
    HAve HMin SAve SMax
CLICK 10 0.88 0.13 -0.34 0.65
Hierarchical 10 0.87 -0.75 -0.13 0.9
 


  
Figure: CLICK's clustering of the fibroblasts serum response data [9]. x-axis: 1-12: synchronized time-points. 13: unsynchronized point. y-axis: normalized expression levels. The solid line in each sub-figure plots the average pattern for that cluster. Error bars display the measured standard deviation. The cluster size is printed above each plot.
\framebox{
\includegraphics{lec11_fig/CLICK_fibro.eps}
}


 
Table: A comparison between CLICK and SYSTERS on a dataset of 117,835 proteins [11]. Measures based on similarity when no correct solution is known: For a fixed threshold t, homogeneity is the fraction of mates with similarity above t, and separation is the fraction of non-mates with similarity above t.
Program #Clusters #Singletons Homogeneity Separation Time(min)
CLICK 9,429 17,119 0.24 0.03 126.3
SYSTERS 10,891 28,300 0.14 0.03 -
 


 
Table 11.8: A Summary of the time performance of CLICK on the above mentioned datasets. CLICK was executed on an SGI ORIGIN200 machine utilizing one IP27 processor. The time does not include preprocessing time. The ``Improvement'' column describes whether the solution of the CLICK algorithm was better than the compared algorithm.
Elements Problem Compared to Improvement Time(min)
517 Gene Expression Fibroblasts Cluster [5] Yes 0.5
826 Gene Expression Yeast cell cycle GeneCluster [23] Yes 0.2
2,329 cDNA OFP Blood Monocytes HCS [7] Yes 0.8
20,275 cDNA OFP Sea urchin eggs K-Means [8] Yes 32.5
72,623 Protein similarity ProtoMap [24] Minor 53
117,835 Protein similarity SYSTERS [11] Yes 126.3
 


  
Figure: Comparison of clustering algorithms using homogeneity and separation criteria. The data consisted of 698 genes, 71 conditions [22]. Each algorithm was run by its authors in a ``blind'' test.
\framebox{
\includegraphics{lec11_fig/alg_compare.eps}
}


next up previous
Next: Bibliography Up: Analyzing Gene Expression Data Previous: Quality Assessment
Peer Itsik
2001-01-31