beginning gif up gif contents news

Next: Segmentation: Introduction and Edge Detection
Up: Lectures
Previous: Image Restoration

 

Image Compression

Last updated on Wednesday, October 18, 1995 at 9:00 AM.


Reading

Gonzalez and Woods, Ch. 6: Introduction, 6.1, 6.4--6.4.2, 6.5.2--6.7


Introduction

One could spend an entire semester on compression alone, but we'll only spend a day or two. I'll try to hit the high points only so that you'll have a general framework for understanding specific methods you may encounter.


Lossless vs. Lossy Compression

Compression comes in two basic flavors: lossless (or information preserving) or lossy.

Lossless compression is what the name implies: you can reconstruct a perfect reproduction of the original from the compression.

Lossy compression involves the loss of some information. When you reconstruct the information from the compressed data, you get something close to but not exactly the same as the original. However, this loss isn't where all of the compression comes from--most lossy compression algorithms include some lossless part. You can get more compression by allowing the algorithm to lose more information. Lossy compression algorithms usually have an adjustable parameter to control this compression vs. quality tradeoff.

Lossy compression is very useful for images, audio signals, or other information that is perceived through our senses. It is not, however, useful for text, numerical data, etc. where there is a precise meaning for the data.

Even for images or other perceived signals, lossless compression is sometimes required, particularly for legal documents, medical images, etc. Imagine the field day a lawyer would have when he finds out that a patient was misdiagnosed from a "lossy"-compressed image. It doesn't seem to matter that the lossy version may be perceptually indistinquishable from the lessless version--the very term "lossy" sends the doctors and lawyers scurrying.


Information Content and Redundancy

There is a difference between information and data. Information is the message that is conveyed; data is the words used to convey it. We're all familiar with people that use lots of data (words) to convey little information. [You probably think your instructor is one of them.]

Information is, simply put, unexpected information. There is significant data content between, "It's 10:00 and all's well," and "Fire!" The longer message is expected, so we don't really think much about it--hence, little information. The shorter message is unexpected, and thus has very high information content. One should basically judge the information content of a message by the difference in the way one would act if they heard the message and if they didn't.

Consider another example: if I dropped a letter from a word so that you only got "hel?o", you could probably guess that the letter "l" was missing. Hence, the letter "l" in this context has little information. However, if I gave you with word "?ook", you might equally guess "l", "b", "h", "c", and so on. In this case, the letter "l" would have significant information content. The very same symbol can have different information content depending on the context.

Redundancy is the difference between the shortest way one could convey a piece of information (i.e., the information content itself) and the data used to represent it. In the previous example, the first "l" is highly redundant while the second "l" is not.


Three Ways to Compress

There are three basic types of redundancy in images:
  1. Coding Redundancy
  2. Interpixel Redundancy
  3. Visual Redundancy

All image compression algorithms attack one or more of these kinds of redundancy.


Coding Redundancy

Pixels have to be encoded in images. Like all things in a computer, we have to assign some arbitrary set of bits for different possible values. Sometimes, some values are more probable that others. As such, they are more expected and thus have less information content. Others are less probable (less expected) and thus have more information content.

The key to reducing coding redundancy is to assign bits according to the information content. The more information, the more bits; the less information, the fewer bits.

If we try to eliminate dcoding redundancy in textual information, we can measure the frequency with which each letter occurs. If it occurs more frequently, encode it with as few bits as possible. If it doesn't occur much at all, use more bits. Thus, we might encode "E" with a few bits and "X" or "Q" with lots of them. Who cares if "X" takes 12, 20, or even 50 bits--it doesn't occur that often anyway.

Likewise, we can use such variable-length coding for pixels or other information for images. For example, if an image is 50 and 50 black pixels? Why not use one bit to signal if the pixel is black or not, and then follow the non-black ones by the actual grey-level value (8 bits). This would use one bit for half the image and nine bits for the other half. This would use 4.5 bits per pixel instead of eight.

The best known and most efficient method for variable-length encoding is Huffman coding. Pure Huffman coding is computationally expensive and requires measuring the full statistics of the data. Typically, a modified form is used that is based on presumed frequency distributions and not-quite-perfect encoding.


Interpixel Redundancy

If you were to scan across a row of a binary image and read off the values, and if the values were "white", "white", "white", "white", "white", what would you guess the next one to be? While either "black" or "white" are possible, you'd probably guess that the next one is white. This makes sense for images because they usually have regions of constant or similar values. This is called interpixel redundancy.

Run-Length Encoding

Instead of using individual codes for black and white pixels in binary images, we can encode runs of black or white pixels. The sequence 111111000000000011111 thus becomes 6 1s, 10 0s, and 5 1s. This is called run-length encoding or RLE.

An RLE image is thus encoded as a sequence of (count,value) pairs. Typically, there is fixed size for the count value, such as one byte. In this case, a run of 500 1s would be (256,1),(244,1). The size of the count value is a tradeoff: if too small, you end up encoding long runs with several short pieces instead of one, if too large, there's wasted space used for short runs.

There are some perverse cases where RLE can actually cause negative compression--cases where the compressed data, because of overhead, takes more space than the original. A pattern of alternating black and white pixels, for example, would be encoded as (1,1),(1,0),(1,1),(1,0), etc. Instead of one bit per pixel, we're using one bit plus the length of the count field!

RLE is usually used for binary images, and is commonly used for FAX transmission encoding or other scanned paper documents. These use not just runs across one row, but also use redundancy across the rows.

RLE can also be used for grey-level images, though. Typically regular RLE won't work for noisy grey-level images because the value changes from pixel to pixel. However, if we look at the bit planes of the grey-level values, we see that there are large runs where the most-significant bit is 1 (lighter regions) and others where it is 0 (darker regions). It doesn't make sense to use this all the way down to the lease-significant bit, but is useful for the most-significant bits when combined with regular encoding of the least significant bits.

Delta Modulation

Because pixels tend to change little from one to the next, let's just encode the relative difference from pixel to pixel, not the absolute value of each one. Since this difference is usually small, we can use fewer bits for it than we normally would for each pixel value.

This works well, except when the difference is large. A phenomenon called slope overload can occur where the change is greater than the maximum difference that can be encoded. This causes sharp changes to become more gradual as the difference encoding races to catch up. A good example of this can be seen in Fig. 6.22 of your text.

Interframe Redundancy

For motion sequences, there is significant redundancy in the content between frames. If something doesn't move, it is there, in exactly the same way, on every frame. Even if imaging noise causes small variations, these can be difference-encoded across frames in an efficient manner. If something does move, it usually retains all of its properties, just translated by some small increment.

We can thus use interframe encoding methods to reduce this redundancy. Such difference encoding is very common for motion-picture sequences. A problem can some when there is a large change between frames (such as a sudden explosion, change in lighting, etc.). In this case, simply encode the frame as you normally would an image (without interframe encoding), and proceed from there.


Visual Redundancy

The third way of reducing redundancy is to realize that our senses aren't equally sensitive to all things. In this sense, information content relates to the ability of our visual systems to tell the difference.

Examples of this include

By transforming our image into spaces that correspond to these properties, we can then better reduce the number of bits we devote to them.

For example, by transforming to YIQ color spaces, we separate intensity from color and can correctly apportion the bits. Likewise, by transforming to frequency-based spaces (such the frequency domains of the Fourier Transform or DCT), we can distribute our precious bits according to spatial frequency.


Transform-Based Compression

By transforming an image from one space to another, we can not only separate different visual properties, but we can also change the predictive (and thus information-containing) properties of the data. If we can transform it so that there is more coding or interpixel (or inter-whatever) redundancy, we can compress the image better.

Energy compaction is a term often used to describe the ability of a particular transform to separate the information content and compact it into one part of the space. As we discussed earlier, the Hotelling or Karhunen-Loeve transform has optimal energy compaction, DCT has very good compaction, the Fourier Transform was good compaction, etc.


An Example of All Three: JPEG

The best compression algorithms use all three forms of redundancy reduction.

The Joint Photographic Experts Group (JPEG) has defined a standard for compressing color images that uses several distinct steps:

  1. Convert the color image to YIQ space.
  2. Break the image into blocks.
  3. Compute the DCT for each block.
  4. Quantize the DCT terms according to frequency--finer quantization for low frequencies, coarser quantization for higher ones. [Visual Redundancy]
  5. Difference encode the zero-frequency DC term. [Interpixel Redundancy]
  6. Convert the 63 other AC terms into a single sequence by scanning the frequency space in a zig-zag way. Since the DCT and quantization steps can produce many zeroes in the sequence (see pg. 400 of your text), this tends to bring them together.
  7. Encode the AC terms as their value and the number of preceeding zero values. [Interpixel Redundancy]
  8. Huffman encode the AC terms. [Coding Redundancy]

Whew! It sounds like a compression algorithm designed by a committee, but by using all of these methods to attack redundancy, the JPEG standard produces very good compression.

The lossy part of the algorithm occurs primarily in the quantization stage. JPEG has a "quality" parameter tha controls the coarseness of this quantization and hence the compression rate (and, inversely, the quantity of the compression).

A motion-picture form of JPEG (called MPEG) adds interframe differencing to the mix.


Vocabulary



beginning gif up gif contents news

Next: Segmentation: Introduction and Edge Detection
Up: Lectures
Previous: Image Restoration

© Bryan S. Morse, 1995