Last updated on Monday, November 6, 1995 at 9:00 AM.
In the last chapter, we discussed ways of segmenting an image into boundaries or regions.
In this chapter, we'll discuss how to represent the objects in an image once they are identified.
It's simple to describe a region by simply listing its points, but that doesn't really tell us much about it. Sure, we could count the points to give the area, or we could match sets of points to see if there's an exact match, but otherwise there aren't many properties of the object that we could identify from just a list of points.
First, we'll discuss how to encode the boundary or region (representation), and then we'll discuss how to describe each in ways that distill certain properties from the object (descriptors).
In this lecture, we'll discuss boundary representations. In the next lecture we'll discuss something called medial representations.
Remember how we compressed a signal by only encoding the difference between successive values, not the full values themselves? Well, we can do the same thing with boundaries. Once we have identified the boundary pixels of a region, we can encode them by only storing the relative difference as we go from pixel to pixel clockwise around the contour.
If the contour is 4-connected, we only need four values to encode the direction to the next pixel. If the contour is 8-connected, we need eight numbers. See Figs. 8.1 and 8.2 in your text for an example.
By encoding relative, rather than absolute position of the contour, the representation is translation invariant. We can match boundaries by comparing their chain codes, but with the following problems:
This differencing allows us to rotate the object in 90-degree increments and still compare the objects, but it doesn't get around the inherent sensitivity of chain codes to rotation on the discrete pixel grid.
Another method, similar to chain codes, is to store
the tangent vector at every boundary point.
In fact, we don't really care about the magnitude of the tangent, only
the direction.
So, we can encode the angle of the tangent
as a function of
arc length s around the contour.
Hence, this representation is called a
-s curve.
Since we're encoding the angle of the tangent vector, we can choose
some arbitrary reference frame (zero-degree tangent).
If we let the tangent at the starting point have an angle of zero,
the rest of the curve is described relative to it.
Thus, our
-s curve always starts and ends at zero.
The
-s representation has the following advantages:
We are still limited when comparing curves with different starting
points, but if we rotate the s axis of the
-s
representation and shift the
axis so that the starting point
is at zero, we can compare such curves.
Another method for compacting the boundary information is to fit line segments or other primitives to the boundary. We then need only to store the parameters of these primitives instead of discrete points. This is useful because it reduces the effects of discrete pixelization of the contour.
We can fit such lines using either merging methods or splitting methods.
For example, choose one point as a starting point. For each new point that we add, let a line go from the start point to the new point. Then, compute the squared error or other such measure for every point along the segment/line. If the error exceeds some threshold, keep the line from the start to the previous point and start a new line.
If we have thick boundaries rather than single-pixel thick ones, we can still use a similar approach called tunneling. Imagine that we're trying to lay straight rods along a curved tunnel, and that we want to use as few as possible. We can start at one point and lay as long a straight rod as possible. Eventually, the curvature of the "tunnel" won't let us go any further, so we lay another rod and another until we reach the end.
For a closed contour, we can find the two points that lie farthest apart and fit two lines between them, one for one side and one for the other. Then, we can apply the recursive splitting procedure to each side.
In general, a signature is any 1-D function representing the 2-D
boundary. Chain codes, differenced chain codes,
-s curves,
etc. are all signatures.
Many other forms of signatures have been proposed in the computer
vision literature.
One other form of signature is to encode the radial distance r from
the center of the object as a function of angle theta.
This method is rotationally variant, but if (as with signatures we've
already looked at) you rotate the
axis of the function you
can compare at different angles.
This particular method is also sensitive to scaling, but it has the
nice property, not found in the others yet discussed, that a scaled
version of the same object produces a scaled version of the
signature. We can normalize the signature by some other feature of
the object, such as maximum radial distance r, length of the
longest axis, etc.
Yet another way of describing an object is to determine the convex hull of the object. A convex hull is a minimal convex shape entirely bounding an object. Imagine a rubber band snapped around the shape and you have the convex hull. We can look at where the convex hull touches the contour and use it to divide the boundary into segments. These points are the extremal points of the object.
Another useful method for breaking a boundary into segments is to identify the points of maximum positive or minimum negative curvature. These extremes are where the object has exterior or interior corners and make useful key points for analysis.
Notice that as we progress from one representation to another, we're basically trying to do three things:
Much of my own recent research has been oriented towards these areas, particularly the last one.
After we talk about medial representations, we'll start talking about extracting specific features of objcts that can be used to distinguish between one object and another: area, perimeter, etc.
-s Curves
© Bryan S. Morse, 1995