Last updated on Monday, November 13, 1995 at 9:00 AM.
Be warned that I'll be jumping around a bit within these sections to better fit how I think the material should be presented.
What qualifies as a good descriptor? In general, the better the descriptor is, the greater the difference in the descriptors of significantly different shapes and the lesser the difference for similar shapes. What then qualifies similarity of shape? Well, nobody's really been able to answer that one yet. If we could quantify similarity of shape, we'd have the perfect descriptor. Indeed, that's what descriptors are: attempts to quantify shape in ways that agree with human intuition (or task-specific requirements).
.
A useful topological descriptor is the Euler number E: the number of
connected components C minus the number of holes H:

(You should know how to use connected-component labeling to determine
the Euler number.)
While the Euler number may seem like a simple descriptor (it is), it can be useful for separating simple shapes.
Such a normalized differential chain code is called a shape number.
As with chain codes, shape numbers are very sensitive to rotation, to resizing, and to discrete pixel artifacts. We can try to normalize away these effects by resampling the grid along some principal axis of the shape. This resampling (perhaps at some angle and sampling rate different from the original grid) can account for rotation, resizing, and minor pixel artifacts.
Suppose that the boundary of a particular shape has N pixels numbered from
0 to N-1. The k-th pixel along the contour has position
.
So, we can describe the contour as two parametric equations:

Let's suppose that we take the Fourier Transform of each function.
We end up with two frequency spectra:

For a finite number of discrete contour pixels we simply use the Discrete Fourier Transform. Remember that the DFT treats the signal as periodic. That's no problem here--the contour itself is periodic, isn't it?
These two spectra are called Fourier descriptors.
Your text takes things one step further by considering the
coordinates of
the point not as Cartesian coordinates but as those in the complex plane:

Thus, you get a single Fourier descriptor which is the transform of
the complex function
.
In a sense, though, this is unnecessary:

Combining things into a single descriptor instead of two is useful, but it's
really only folding one descriptor onto another.
Now, suppose that the signal has little high-frequency content. This implies little change in the x or y coordinates as you proceed around the contour. What would this shape look like? (Smooth)
What happens if we low-pass filter a Fourier descriptor? Isn't this just the same as smoothing the contour? In this way, the low-frequency components of the Fourier descriptor capture the general shape properties of the object, and the high-frequency components capture the finer detail. Just as with filtering any other signal, though, it does so without regard to spatial position. There are times (as we'll see later) where this may be important.
Suppose that instead of using all k components of
, we only used
the first m < k of them.
What if we reconstruct the shape from such a truncated descriptor?
Don't we just get successively refined approximations to the shape as we add more
terms?
See Fig. 8.15 in your text for an example.
and
.
So, we only change the zero-frequency component.
In fact, what do you think the zero-frequency component tells us about the shape?
(Mean position only--nothing about the shape.)
So, except for the zero-frequency component, Fourier Descriptors are translation
invariant.
is multiplication by
.
So, rotation about the origin of the coordinate system only multiplies the
Fourier descriptors by
.
I emphasize that this is rotation about the origin of the coordinate system, not rotation about the center of the positioned shape. If we throw out the zero-frequency component (the position), the results of rotation are the same regardless of position.
and
by some constant.
As you are well-acquainted by now, that's just multiplication of the Fourier
descriptor by the same constant.
(Again, we ignore the value of the zero-frequency component.)
along the k dimension?
Remember from our discussion of the Fourier Transform that translation in the
spatial domain (in this case, k) is a phase-shift in the transform.
So, the magnitude part of
is invariant to the start point, and the phase
part shifts accordingly.
Table 8.1 in your text shows these properties mathematically.
For three-dimensional images, spherical harmonics have been used as higher-dimensional basis functions.
[ Note: This material is distributed throughout the chapter, occuring in 8.2.4, 8.3.3, and 8.3.4. I feel it will be easier to discuss moments just once and then talk about how to apply them as we discuss each section.]
,
we can compute the mean value of the function using

We can also describe the variance by

A third statistical property, called skew, describes how symmetric the
function is:

All of these are examples moments of the function.
One can define moments about some arbitrary point, usually either about zero
or about the mean.
The n-th moment about zero, denoted as
, is

is the total value of the function.
The mean
is the first moment about zero divided by the zeroeth moment:

The n-th moment about the mean, denoted as
and called
the n-th central moment is

The zeroeth central moment
is, again, the total value of the function.
The first central moment
is always 0. (Do you see why?)
The second central moment
, when normalized by the total value
is the variance:

The third central moment
, when normalized by the total value
is the skew:

The fourth central moment
, when normalized by the total value
is the kirtosis:

If we have an infinite number of central moments, we can completely describe the shape of the function. (We do, of course, also need the mean itself to position the function).
signatures,
signatures, etc.
moments about the mean.
But what if we only use some subset of these? We'll get an increasingly accurate reconstruction of the function for each additional moment. The interpretation of this approximation naturally depends on what we're approximating.

is the total value of the function.
The
is the x component
of the mean
and
is the y component
of the mean.

Similar to before,
.
We can use these moments to provide useful descriptors of the shape. Suppose that for a binary shape we let the pixels outside the shape have value 0 and the pixels inside the shape value 1.
The area of the shape is

The moments
and
are the variances of x and y
respectively.
The moment
is the covariance between x and
y, something we've already seen.
You can use the covariance to determine the orientation of the shape.
The covariance matrix C (convince your self of this) is

By finding we the eigenvalues and eigenvectors of C, we can determine the eccentricity of the shape (how elongated it is) by looking at the ratio of the eigenvalues, and we can determine the direction of elongation by using the direction of the eigenvector with largest eigenvalue.
There are many other useful ways that we can combine moments to describe a shape. Even more fundamentally, though, we can completely reconstruct the shape if we have enough moments. But what if we don't use enough? Well, we can construct approximations to the shape that share these same properties. I could, for example, make a circle with the same mean as the shape, an ellipse with the same mean and covariance. etc.
In a sense, the sequence of moments is analogous to the components of a Fourier sequence--the first few terms give the general shape, and the later terms fill in finer detail.
Moments have been shown to be a very useful set of descriptors for matching.
.
This merely scales
and
by
as well.
Hence, the n-th moment scale by the corresponding power of
.
© Bryan S. Morse, 1995