beginning gif up gif contents news

Next: Noise
Up: Lectures
Previous: Discrete Spatial SamplingThe Fast Fourier Transform

 

Other Transforms

Last updated on Thursday, September 28, 1995 at 6:05 PM.


Reading

Gonzalez and Woods: skim all of 3.5, read 3.5.3 and 3.6 in more detail.

 


Other Separable Transforms

Briefly skim over the discussion in Section 3.5 of your text of the Walsh, Hadamard, Haar, and Slant transforms to get the flavor of different transforms.

These examples help to illustrate the reason why transforms are useful: allow you to look at the information in the signal or image in a different way.

 


Discrete Cosine Transform

Suppose that instead of the Fourier basis functions we used the following set of functions:


where


We can then define a forward transform
 


and an inverse transform
 


Eq. 9.3 is the Discrete Cosine Transform (DCT) and Eq. 9.4 is the Inverse Discrete Cosine Transform (inverse DCT).

The DCT is separable, so you can construct the 2-D DCT just as we did for the 2-D Fourier Transform:
 


and an inverse transform
 


In principle, the DCT is similar to the Fourier Transform in the way it decomposes a signal or image into sinusoids.

While the Fourier Transform is a better analytical tool (primarily because of the convolution theorem), the DCT has a few advantages:

  1. The DCT works only with real, rather than complex, numbers and is faster to compute.
  2. The DCT has slightly better "energy compaction" than the Fourier Transform-more of the information is packed into fewer of the components.

These two properties make the DCT the transform of choice for the JPEG and MPEG compression algorithms.

 


Hotelling Transform

The Fourier Transform and the DCT (as well as the others in Section 3.5) all use a fixed set of basis functions for the transformation. What if we chose a set of basis functions based on the input image itself?

These different transforms all compact the information in the signal in one form or another, but the optimal compaction of information comes when we let the image itself tell us which basis functions give the best compaction.

Suppose that we have a multispectral image as shown on page 153 of your text. Each band of the spectrum has its own information, but it may be that the most information isn't found in one image band or another, but in a linear combination of information from the collective bands. How do we determine this?

One way is to find the covariance matrix C of the pixels. Since C is real and symmetric, it always has eigenvectors. If we sort the eigenvalues, we can rank the corresponding eigenvectors. These eigenvectors indicate the spread of the information content in the pixels-the eigenvector with the largest variance contains the most information, the one with the next largest variance contains the most information if all components of the first one are removed, and so on until one is left with the eigenvector with the smallest eigenvalue, which corresponds to the dimension of least information.

Each eigenvector is a linear combination of the original basis vectors, so the components of each eigenvector indicate how much of each original image band goes in the optimal mix.

Transforming the image into the coordinate space whose basis vectors are the eigenvectors of the covariance matrix is called the Hotelling Transform, sometimes known as the Karhunen-Loeve Transform or principal-component analysis.

Notice that the basis vectors for such a transformation are different for each image-unlike the Fourier Transform or the DCT which use the same basis vectors. In this way, the Hotelling Tranformation does a "best fit" to the image by choosing a basis set that gives optimal compaction of the information.

 


Co-joint Representations and Wavelets

Observe from Eqs 5.4 and 5.5 that to compute the Fourier Transform of an image, one needs the entire signal (image). Likewise, to reconstruct the signal (image), one needs all of the Fourier Transform components. That means that each point in the frequency domain represents information about the entire image. Even though a particular frequency coefficient may come from a particular place in the image, one cannot directly use the information in the frequency domain to determine where. How can we attempt to spatially localize frequency information?

Consider the following function:


for some and . This is called a Gabor function (or sometimes a Gabor blob. It is a sine wave of frequency modulated (multiplied) by a Gaussian of standard deviation .

Since a Gabor function is the product of two other functions, its Fourier Transform is the convolution of the Fourier Transforms of those functions. Specifically, the Fourier Transform of a sine wave with frequency is a single pulse at , and the Fourier Transform of a Gaussian of standard deviation is a Gaussian in the frequency domain of standard deviation . When you convolve these two, you get a Gaussian of standard deviation and mean .

By the Convolution Theorem, when you convolve a signal with a Gabor function you multiply its Fourier Transform by this Gaussian bandpass filter at . Because the Gabor function has limited spatial extent (as limited by the Gaussian modulation), it localizes information in the spatial domain. Because it has limited frequency extent (as limited by the Gaussian bandpass filter in the frequency domain), it also localizes information in the frequency domain. This type of simultaneous localization of information in both domains is called a co-joint representation.

The Gabor function doesn't perfectly localize information because it has a spatial spread. Likewise, it doesn't perfectly localize information in the frequency domain because of the spread of the bandpass filter.

Suppose we make the spatial spread smaller by using a smaller for the Gabor function. What happens in the frequency domain? (It gets worse.) Consider what happens in the limit: as the Gaussian gets infinitely small, you perfectly localize information spatially but need all of the frequency domain to do it--just like the inverse Fourier Transform.

Suppose we make the frequency spread smaller by using a smaller for the bandpass filter. What happens in the spatial domain? (Again, it gets worse.) Again, consider what happens in the limit: as the Gaussian gets infinitely large, you perfectly localize information in the frequency domain but need all of the spatial domain to do it--just like the Fourier Transform itself.

You can't simultaneously localize information in both domains--if you try to do better in one, you must get worse in the other. This phenomemon is called co-joint uncertainty.

There are many types of co-joint representations, but the most popular one in the literature today is the class called wavelets. We won't cover wavelets in this course, but when you do learn about them, keep this co-joint representation (and uncertainty) in mind.



beginning gif up gif contents news

Next: Noise
Up: Lectures
Previous: Discrete Spatial SamplingThe Fast Fourier Transform

© Bryan S. Morse, 1995