next up previous
Next: Refinements Up: The HCS Algorithm Previous: Running Time

Properties of HCS clustering


\begin{theorem}The diameter of each cluster is smaller than or equal to 2. That is, the
distance between two vertices is at most 2.
\end{theorem}

\begin{proof}Consider the graph $G(V,E)$\space in the HCS iteration that found t...
...he total number of their neighbors cannot exceed $\vert V\vert-2$ .
\end{proof}
While we have proven that each highly connected cluster has a small diameter, the converse does not necessarily hold. That is, G may have a subgraph, with diameter 2 that is not a highly connected component.
\begin{lemma}Let $S$\space be a set of edges forming a minimum cut in the graph ...
...\bar{H})\vert$ , with equality only if $\bar{H}$\space is a clique.
\end{lemma}
The lemma implies that if a minimum cut S in G=(V,E) satisfies $\vert S\vert>\frac{\vert V\vert}{2}$ then S splits the graph into a single vertex $\{v\}$ and $G \setminus \{v\}$. This shows us that using a stronger stopping criterion for the algorithm, i.e., $\vert S\vert>\alpha$, for $\alpha>\frac{\vert V\vert}{2}$ will be detrimental for clustering: Any cut of value $x > \frac{\vert V\vert}{2}$ separates only a singleton from the current graph.
\begin{theorem}Let $S$\space be a minimum cut in the graph $G=(V,E)$\space where...
...{H}$\space is incident on $S$ ,
(2) $\bar{H}$\space is a clique.
\end{theorem}
It can be shown, using this theorem, that the union of two vertex sets split by any step of HCS is unlikely to induce a graph with diameter $\leq 2$ if noise is random, and the vertex sets are not too small. Another property of the solution is given by:
\begin{theorem}1. The number of edges in a highly connected subgraph is quadrati...
...emoved by each iteration of the HCS algorithm is at
most linear.
\end{theorem}

\begin{proof}Let $n$\space ($m$ ) be the number of vertices (edges) in the graph...
...}{2}$ . Therefore, obviously the number of removed edges is linear.
\end{proof}


Peer Itsik
2001-01-31