Last updated on Wednesday, November 1, 1995 at 3:30 PM.
The simplest property that pixels in a region can share is intensity. So, a natural way to segment such regions is through thresholding, the separation of light and dark regions.
Thresholding creates binary images from grey-level ones by turning all pixels below some threshold to zero and all pixels about that threshold to one. (What you want to do with pixels at the threshold doesn't matter, as long as you're consistent.)
If
is a thresholded version of
at some global
threshold t,

We can easily include extraneous pixels that aren't part of the desired region, and we can just as easily miss isolated pixels within the region (especially near the boundaries of the region). These effects get worse as the noise gets worse, simply because it's more likely that a pixels intensity doesn't represent the normal intensity in the region.
When we use thresholding, we typically have to play with it, sometimes losing too much of the region and sometimes getting too many extraneous background pixels. (Shadows of objects in the image are also a real pain--not just where they fall across another object but where they mistakenly get included as part of a dark object on a light background.)
We can deal, at least in part, with such uneven illumination by determining thresholds locally. That is, instead of having a single global threshold, we allow the threshold itself to smoothly vary across the image.
Recall that a histogram is a probability distribution:

That is, the number of pixels
having intensity z as a fraction of
the total number of pixels n.
Here are three ways to look at the problem:
Understand that the place of minimum overlap (the place where the misclassified areas of the distributions are equal) is not is not necessarily where the valley occurs in the combined histogram. This occurs, for example, when one cluster has a wide distribution and the other a narrow one.
The first way that we can try to do this is to consider the values in the two regions as two clusters. By making each cluster as tight as possible, we can minimize their overlap. Now, we can't change the distributions, but we can adjust where we separate them (the threshold). As we adjust the threshold one way, we increase the spread of one and decrease the spread of the other. The goal then is to select the threshold that minimizes the combined spread.
We can define the within-class variance as the weighted sum of
the variances of each cluster:

where

and N is the number of intensity levels.
Computing this within-class variance for each of the two classes for each possible threshold involves a lot of computation. It turns out there's an easier way.
If you subtract the within-class variance from the total variance of
the population, you get something called the between-class variance:

where
is the combined variance and
is the combined
mean.
Notice that the between-class variance is simply the weighted
variance of the cluster means themselves around the overall mean.
Substituting
and simplifying, we get

So, for each potential threshold we
This sounds like a lot of work, but it turns out that the
computations aren't independent as we move from one threshold to
another.
We can update
,
, and the respective cluster means as
pixels move from one cluster to the other as t increases.
Using simple recurrence relations we can update the between-class
variance as we successively test each threshold.
This method is sometimes called the Otsu method, after its publisher.
and
respectively)
and a standard deviation (
and
respectively)
independent of the threshold we choose.
Whereas the Otsu method separated the two clusters according to the threshold and tried to optimize some statistical measure, mixture modeling assumes that there already exists two distributions and we must find them. Once we find them, then we can determine the best threshold.
Unfortunately, we have six unknown parameters
(
,
,
,
,
, and
),
so we need to make some estimates of these quantities.
If the two distributions are reasonably well separated (some overlap but not too much), we can choose an arbitrary threshold t and assume that the mean and standard deviation of each group approximates the mean and standard deviation of the two underlying populations. We can then measure how well a mix of the two distributions approximates the overall distribution.
Choosing the optimal threshold thus becomes a matter of finding the one that causes the mixture of the two estimated Gaussian distributions to approximate the actual histogram. We can do this either exhaustively or using a gradient descent method.
Thus, our threshold method based on pixels near boundaries will cause separations of the pixels in ways that tend to preserve the boundaries. Other scattered distributions within the object or the background are of no relevance.
However, if the characteristics change along the boundary, we're still in trouble. And, of course, there's still no guarantee that we'll not have extraneous pixels or holes.
© Bryan S. Morse, 1995