Last updated on Monday, October 23, 1995 at 9:00 AM.
Automatedf segmentation is probably the most difficult problem in computer vision (and the one that I find most interesting). There are, by my experience, three major reasons why automated segmentation is so hard.
Still, there have been many successful applications of segmentation techniques in limited application domains. We will study some of these here.
One set of segmentation techniques focuses on finding specific features in an image: points, lines, edges, etc.
These detectors usually look like what they're finding.
To understand why this is, consider the nine weights of a
operator as a vector in a nine-dimensional space. Likewise,
consider the nine pixels in any
neighborhood as another
vector in this space. The convolution operation (pointwise
multiplication and addting up the results) is simply the dot product
of these two vectors. In this sense, convolution (this dot
product) measures the projection of the neighborhood onto the
template. For unit length vectors, the idea response is, of course,
the template itself.
However, longer vectors in slightly different directions may give a better result. For this reason, you'll sometimes see the results of one linear operator divided by another.


These line detectors must, however, be tuned to the width of the line they are trying to detect. Consider what happens when the above detectors are applied to a 3-pixel wide line.
Most edge detectors are based in some way on measuring the intensity gradient at a point in the image.
As we discussed earlier, the gradient operator
is

When we apply this vector operator to a function, we get

As with any vector, we can computes its magnitude
and orientation
.
The gradient orientation gives us the direction of steepest ascent
for intensity in the image, and the magnitude gives us the steepness
of that ascent.
We discussed earlier what some edge-detection operators look like: typically some sort of differencing operator. Examples of these include the Roberts, Kirsch, Prewitt, and Sobel operators.
Most edges are not, however, sharp dropoffs. They're often gradual transitions from one intensity to another. What usually happens in this case is that you get a rising gradient magnitude, a peak of the gradient magnitude, and then a falling gradient magnitude. Extracting the ideal edge is thus a matter of finding this curve with optimal gradient magnitude. Next lecture, we'll discuss ways of finding these optimal curves of gradient magnitude.
Let's first consider the one-dimensional case. Finding the ideal edge is equivalent to finding the point where the derivative is a maximum (for a rising edge) or a minimum (for a falling edge). Figure 7.4 in your reading shows you what this looks like.
How do we find maxima or minima of one-dimensional functions? We differentiate them and find places where the derivative is zero. Differentiating the first derivative (gradient magnitude) gives us the second derivative. Finding optimal edges (maxima of gradient magnitude) is thus equivalent to finding places where the second derivative is zero.
When we apply differential operators to images, the zeroes rarely fall exactly on a pixel. Typically, they fall between pixels. We can isolate these zeroes by finding zero crossings: places where one pixel is positive and a neighbor is negative (or vica versa).
One nice property of zero crossings is that they provide closed paths (except, of course, where the path extends outside the image border). There are, however, two typical problems with zero-crossing methods:
For two dimensions, there is a single measure, similar to the gradient magnitude that measures second derivatives.
Consider the dot product of
with itself:

We usually write
as
.
It has special name and is called the Laplacian operator.
When we apply it to a function, we get

One interesting property of the Laplacian is that it is rotationally invariant--it doesn't depend on what directions you choose, so long as they are orthogonal. The sum of the second derivatives in any two orthogonal directions is the same.
We can find edges by looking for zero-crossings in the Laplacian of the image. Let's turn our attention then to neighborhood operators for measuring the Laplacian.

We can thus use a simple
neighborhood operator to approximate the second derivative in one
dimension.
The Laplacian is one of these in the x direction added to one of
these in the y direction:

As mentioned earlier, zero crossings of this operator can be used to find edges, but it is susceptible to noise. This sensitivity comes not just from the sensitivity of the zero-crossings, but also of the differentiation. In general, the higher the derivative, the more sensitive the operator.
If we first blur an image with a Gaussian smoothing operator and then apply the Laplacian operator, we can get a more robust edge-finder.
But why go to all the trouble? Convolution is associative and commutative, so we can combine the Gaussian smoothing operator with the Laplacian operator (by convolving them one with another) to form a single edge-finding operator.
But this is still too much work. Instead of approximating the
Laplacian operator with forward differencing and then applying it to
a Gaussian, we can simply differentiate the Gaussian

(where
) to get a
closed-form solution for the Laplacian of Gaussian:

We can then approximate this over a discrete spatial neighborhood
to give us a convolution kernel.
But wait! There's an even better way.
We can not only calculate the closed-form solution for a Laplacian of
Gaussian, we can compute its Fourier Transorm. We know that the
Fourier Transform of a Gaussian is another Gaussian, and we also know
that we can differentiate using a ramp function
(
or
)
in the frequency domain.
Multiply together the spectrum of the image, the Fourier Transform of
a Gaussian, and two differentiating ramps in the one direction and
you have a second-derivative of Gaussian in one direction. Do the
same thing in the other direction, add them together, and you have
the Laplacian of Gaussian of the image.
(See, that stuff from Ch. 3 comes in handy after all.)
of a Gaussian. That is, it is the limit
of one Gaussian minus a just every so smaller Gaussian.
For this reason, we can approximate it as the difference of two
Gaussians.
This is called the Difference-of-Gaussians or DOG operator.
Also appearing in the literature is the Difference-of-LowPass filters or DoLP operator.
,
, or
operator will often be different.
What is the correct window size?
Well, there isn't one standard one that will work under every
circumstance.
Sometimes you have to play with it for your application.
Smart vision programs will often use multiple windows and combine the
results in some way.
© Bryan S. Morse, 1995