next up previous
Next: Algorithm Performance Comparisons Up: The CLICK Algorithm Previous: Refinements

Quality Assessment

When the ``correct'' solution for the clustering problem is known, we can use the same methods as for the HCS algorithm (sec 11.3.5). In most cases, unfortunately, the ``correct'' solution for the clustering problems is unknown. In this case we evaluate the quality of the solution by computing two figures of merit to measure the homogeneity and separation of the produced clusters. For fingerprint data, homogeneity is evaluated by the average and minimum correlation coefficient between the fingerprint of an element and the fingerprint of its corresponding cluster. Precisely, if cl(u) is the cluster of u, F(X) and F(u) are the fingerprints of a cluster X and an element u respectively, and S(x,y) is the correlation coefficient (or any other similarity measure) of fingerprints x and y, then

\begin{displaymath}H_{Ave}=\frac{1}{\vert N\vert}\sum_{u\in N}S(F(u),F(cl(u)))
\end{displaymath}


\begin{displaymath}H_{Min}=\min_{u\in N}S(F(u),F(cl(u)))
\end{displaymath}

Separation is evaluated by the weighted average and the maximum correlation coefficient between cluster fingerprints. That is, if the clusters are $X_1,\ldots,X_t$ then

\begin{displaymath}S_{Ave}=\frac{1}{\sum_{i\neq j}\vert X_i\vert\vert X_j\vert}\sum_{i\neq j}\vert X_i\vert\vert X_j\vert S(F(X_i),F(X_j))
\end{displaymath}


\begin{displaymath}S_{Max}=\max_{i\neq j}S(F(X_i),F(X_j))
\end{displaymath}

Hence, a solution improves if HAve increases and HMin increases, and if SAve decreases and SMax decreases.
next up previous
Next: Algorithm Performance Comparisons Up: The CLICK Algorithm Previous: Refinements
Peer Itsik
2001-01-31