The current standard edge detection scheme widely used around the world is the Canny edge detector. This is the work John Canny did for his Masters degree at MIT in 1983. He treated edge detection as a signal processing problem and aimed to design the `optimal' edge detector. He formally specified an objective function to be optimised and used this to design the operator.

The objective function was designed to achieve the following optimisation constraints:

1.: Maximize the signal to noise ratio to give good detection. This favours the marking of true positives.
2.: Achieve good localization to accurately mark edges.
3.: Minimize the number of responses to a single edge. This favours the identification of true negatives, that is, non-edges are not marked.

Note a difference-of-boxes operator will maximize the signal to noise ratio, but will give several responses to a single edge.

After some complicated analysis Canny came up with a function that was the sum of 4 exponential terms. However, this function looked very much like a first derivative of a Gaussian! So this is what ended up being used.

The overall edge detection procedure he developed was as follows:

1.: We want to find the maxima of the partial derivative of the image function I in the direction orthogonal to the edge direction, and to smooth the signal along the edge direction. Thus Canny's operator looks for the maxima of
$\begin{displaymath} \frac{\partial^2}{\partial n^2}(G \otimes I), \end{displaymath}$
where $n = \frac{\nabla G \otimes I}{ \mid \nabla G \otimes I \mid }.$ However, many implementations of the Canny edge detector actually approximate this process by first convolving the image with a Gaussian to smooth the signal, and then looking for maxima in the first partial derivatives of the resulting signal (using masks similar to the Sobel masks). Thus we can convolve the image with 4 masks, looking for horizontal, vertical and diagonal edges. The direction producing the largest result at each pixel point is marked. Record the convolution result and the direction of the edge at each pixel.
2.: Perform non-maximal suppression. Any gradient value that is not a local peak is set to zero. The edge direction is used in this process.
3.: Find connected sets of edge points and form into lists.
4.: Threshold these edges to eliminate `insignificant' edges. Canny introduced the idea of thresholding hysteresis. This involves having two different threshold values, usually the higher threshold being 3 times the lower. Any pixel in an edge list that has a gradient greater than the higher threshold value are classed as a valid edge point. Any pixels connected to these valid edge points that have a gradient value above the lower threshold value are also classed as edge points. That is, once you have started an edge you don't stop it until the gradient on the edge has dropped considerably.

**Figure 8:** The output of the Canny edge detector on the mandrill image. Note that double edges are marked for line features.
$\begin{figure} \par \centerline{ \psfig {figure=figure65.ps} } \par\end{figure}$

Problems with gradient-based edge detectors

Gradient-based edge detection schemes suffer from a number of problems, but they are still the most commonly used by the computer vision community. Some of these problems include the following:

1.: You have to choose threshold values and the width of your masks. This problem is common to all gradient based methods. Note that if you double the size of an image leaving its grey values unchanged all the gradients will be halved. This complicates the setting of any threshold. An additional problem is that the width of the mask (and hence the degree of smoothing) affects the positions of zero crossings and maximum intensity gradients in the image. The estimated position of any edge should not be affected by the size of the convolution mask.
2.: Corners are often missed because the 1D gradient at corners is usually small. This can cause considerable difficulties for line labeling schemes because they rely on having corners and junctions being marked properly.
3.: First derivative operators will only find step-like features. If you want to find lines you need a different operator. For example, to find bar features you could look for peaks (not zero crossings) from a second derivative operator. Canny did design an operator for finding lines.

Thus, differential edge detection schemes suffer both from false positives and false negatives.

Surface fitting

An alternative approach to edge detection is to develop an edge model and then to compute its degree of match to the image data. In general, though, surface fitting is computationally more expensive than differential approaches to edge detection.

An edge is modeled by specifying its four degrees of freedom: its position, its orientation, and the constant intensities on either side of the step. We could simply match the data by seeking the least squares error fit of the parametric model to the image window, but such an approach is generally computationally very expensive. Normally what is done is that both the image data and the model are represented over small windows by their first 8 coefficients in a particular two-dimensional orthonormal series expansion. In this case the optimisation reduces to just one variable: the orientation of the edge. An edge is declared present when the weighted least squares error is smaller than some pre-set threshold.

Edge detector performance

A number of researchers have considered the problem of measuring edge detector performance. In fact, it is difficult since we don't really know what the underlying features are that we wish to detect. However, if we assume that they are step edges corrupted by Gaussian noise, then some criteria can be set for evaluating performance. Such criteria are usually the following:

the probability of false edges;
the probability of missing edges;
the error in estimating the edge angle;
the mean square distance of the edge estimate from the true edge; and
the algorithm's tolerance to distorted edges and other features such as corners and junctions.

The first two criteria relate to edge detection, the second two to edge localisation, and the last to tolerance to departures from the ideal edge model. Pratt [8] introduced a function FM for measuring quantitatively the performance of various edge detectors. His measure is

FM =

max

(I_A,I_I)

I_A
å
i = 1

1 + d_i a²

where I_A, I_I, d, and a are respectively the detected edges, the ideal edges, the distance between the actual and the ideal edges, and a design constant used to penalise displaced edges.