beginning gif up gif contents news

Next: Descriptions of Region Content
Up: Lectures
Previous: Medial Representations

 

Descriptors

Last updated on Monday, November 13, 1995 at 9:00 AM.


Reading

Gonzalez and Woods, 8.2--8.3.2

Be warned that I'll be jumping around a bit within these sections to better fit how I think the material should be presented.


What Are Descriptors?

In general, descriptors are some set of numbers that are produced to describe a given shape. The shape may not be entirely reconstructable from the descriptors, but the descriptors for different shapes should be different enough that the shapes can be discriminated.

What qualifies as a good descriptor? In general, the better the descriptor is, the greater the difference in the descriptors of significantly different shapes and the lesser the difference for similar shapes. What then qualifies similarity of shape? Well, nobody's really been able to answer that one yet. If we could quantify similarity of shape, we'd have the perfect descriptor. Indeed, that's what descriptors are: attempts to quantify shape in ways that agree with human intuition (or task-specific requirements).


Some Other Simple Shape Descriptors

A few simple descriptors are:
Area.
The number of pixels in the shape.
Perimeter.
The number of pixels in the boundary of the shape.
Compactness.
How closely-packed the shape is: .
Elongation.
The ratio of the length of the longest chord of the shape to the longest chord perpendicular to it. (This is one way to define it--there are others.)
Orientation.
The overall direction of the shape. (I'll come back and define this more precisely later.)


Topological Descriptors

If we stretch a shape like we would a cutout from a rubber sheet, there are certain shapes we can make and others we can't. Topology refers to properties of the shape that don't change, so long as you aren't allowed to tear of join parts of the shape.

A useful topological descriptor is the Euler number E: the number of connected components C minus the number of holes H:


(You should know how to use connected-component labeling to determine the Euler number.)

While the Euler number may seem like a simple descriptor (it is), it can be useful for separating simple shapes.


Shape Numbers

Remember how we discussed the effect changing the starting point has on chain codes and differential chain codes? Moving the starting point rotates the code. If we rotate an n-element code so that it has the smallest value when viewed as an n-digit number, that would be invariant to the selection of the starting point.

Such a normalized differential chain code is called a shape number.

As with chain codes, shape numbers are very sensitive to rotation, to resizing, and to discrete pixel artifacts. We can try to normalize away these effects by resampling the grid along some principal axis of the shape. This resampling (perhaps at some angle and sampling rate different from the original grid) can account for rotation, resizing, and minor pixel artifacts.


Fourier Descriptors

Definition

Suppose that the boundary of a particular shape has N pixels numbered from 0 to N-1. The k-th pixel along the contour has position . So, we can describe the contour as two parametric equations:


Let's suppose that we take the Fourier Transform of each function. We end up with two frequency spectra:


For a finite number of discrete contour pixels we simply use the Discrete Fourier Transform. Remember that the DFT treats the signal as periodic. That's no problem here--the contour itself is periodic, isn't it?

These two spectra are called Fourier descriptors.

Your text takes things one step further by considering the coordinates of the point not as Cartesian coordinates but as those in the complex plane:


Thus, you get a single Fourier descriptor which is the transform of the complex function . In a sense, though, this is unnecessary:


Combining things into a single descriptor instead of two is useful, but it's really only folding one descriptor onto another.

Properties

Let's consider what these spectra mean. First, suppose that the spectra have high-frequency content. That implies rapid change in the x or y coordinate at some point as you proceed around the contour, just like high frequencies normally mean some rapid change in the signal. What would such a contour look like? (Bumpy)

Now, suppose that the signal has little high-frequency content. This implies little change in the x or y coordinates as you proceed around the contour. What would this shape look like? (Smooth)

What happens if we low-pass filter a Fourier descriptor? Isn't this just the same as smoothing the contour? In this way, the low-frequency components of the Fourier descriptor capture the general shape properties of the object, and the high-frequency components capture the finer detail. Just as with filtering any other signal, though, it does so without regard to spatial position. There are times (as we'll see later) where this may be important.

Suppose that instead of using all k components of , we only used the first m < k of them. What if we reconstruct the shape from such a truncated descriptor? Don't we just get successively refined approximations to the shape as we add more terms? See Fig. 8.15 in your text for an example.

Response to Transformations

We can use all of the properties of the Fourier transform to describe the properties of Fourier descriptors:
Translation.
If we translate the object, we're really just adding some constant to all of the values of and . So, we only change the zero-frequency component. In fact, what do you think the zero-frequency component tells us about the shape? (Mean position only--nothing about the shape.) So, except for the zero-frequency component, Fourier Descriptors are translation invariant.
Rotation.
Remember from complex analysis that rotation in the complex plane by angle is multiplication by . So, rotation about the origin of the coordinate system only multiplies the Fourier descriptors by .

I emphasize that this is rotation about the origin of the coordinate system, not rotation about the center of the positioned shape. If we throw out the zero-frequency component (the position), the results of rotation are the same regardless of position.

Scaling
Suppose that we resize the object. That's equivalent to simply multiplying and by some constant. As you are well-acquainted by now, that's just multiplication of the Fourier descriptor by the same constant. (Again, we ignore the value of the zero-frequency component.)
Start Point
What if we change the starting point for the contour? Isn't this simply translation of the one-dimensional signal along the k dimension? Remember from our discussion of the Fourier Transform that translation in the spatial domain (in this case, k) is a phase-shift in the transform. So, the magnitude part of is invariant to the start point, and the phase part shifts accordingly.

Table 8.1 in your text shows these properties mathematically.

Higher-Dimensional Descriptors

Fourier descriptors have also been used to describe higher-dimensional shapes. What happens is that the domain of the function increases from a one-dimensional k for two-dimensional images to an N-1-dimensional one for N-dimensional images.

For three-dimensional images, spherical harmonics have been used as higher-dimensional basis functions.


Moments

Another way to describe shape uses statistical properties called moments.

[ Note: This material is distributed throughout the chapter, occuring in 8.2.4, 8.3.3, and 8.3.4. I feel it will be easier to discuss moments just once and then talk about how to apply them as we discuss each section.]

What Statistical Moments Are

For a function , we can compute the mean value of the function using


We can also describe the variance by


A third statistical property, called skew, describes how symmetric the function is:


All of these are examples moments of the function.

One can define moments about some arbitrary point, usually either about zero or about the mean. The n-th moment about zero, denoted as , is


is the total value of the function. The mean is the first moment about zero divided by the zeroeth moment:


The n-th moment about the mean, denoted as and called the n-th central moment is


The zeroeth central moment is, again, the total value of the function.

The first central moment is always 0. (Do you see why?)

The second central moment , when normalized by the total value is the variance:


The third central moment , when normalized by the total value is the skew:


The fourth central moment , when normalized by the total value is the kirtosis:


If we have an infinite number of central moments, we can completely describe the shape of the function. (We do, of course, also need the mean itself to position the function).

Using Moments of One-Dimensional Functions

There are many things that we've talked about so far that can be described as one-dimensional functions: We can describe any of these N-element signatures by the mean and the moments about the mean.

But what if we only use some subset of these? We'll get an increasingly accurate reconstruction of the function for each additional moment. The interpretation of this approximation naturally depends on what we're approximating.

Moments of Two-Dimensional Functions

Suppose that we have a function not over one variable but of two. We can extend the concept of moments by defining the ijth moment about zero as


is the total value of the function. The is the x component of the mean and is the y component of the mean.



Similar to before, .

We can use these moments to provide useful descriptors of the shape. Suppose that for a binary shape we let the pixels outside the shape have value 0 and the pixels inside the shape value 1.

The area of the shape is


The moments and are the variances of x and y respectively.

The moment is the covariance between x and y, something we've already seen. You can use the covariance to determine the orientation of the shape.

The covariance matrix C (convince your self of this) is


By finding we the eigenvalues and eigenvectors of C, we can determine the eccentricity of the shape (how elongated it is) by looking at the ratio of the eigenvalues, and we can determine the direction of elongation by using the direction of the eigenvector with largest eigenvalue.

There are many other useful ways that we can combine moments to describe a shape. Even more fundamentally, though, we can completely reconstruct the shape if we have enough moments. But what if we don't use enough? Well, we can construct approximations to the shape that share these same properties. I could, for example, make a circle with the same mean as the shape, an ellipse with the same mean and covariance. etc.

In a sense, the sequence of moments is analogous to the components of a Fourier sequence--the first few terms give the general shape, and the later terms fill in finer detail.

Moments have been shown to be a very useful set of descriptors for matching.

Response to Transformations

Finally, let's consider how moments respond to transformations.
Translation.
If we translate the object, we only change the mean, not the variance or higher-order moments.
Rotation.
If we rotate the shape we change the relative variances and higher-order moments, but certain things such as the eigenvalues of the covariance matrix are invariant to rotation.
Scaling
Resizing the object by a factor of S is the same as scaling the x and y coordinates by . This merely scales and by as well. Hence, the n-th moment scale by the corresponding power of .


Vocabulary



beginning gif up gif contents news

Next: Descriptions of Region Content
Up: Lectures
Previous: Medial Representations

© Bryan S. Morse, 1995