Last updated on Wednesday, October 18, 1995 at 9:00 AM.
Lossless compression is what the name implies: you can reconstruct a perfect reproduction of the original from the compression.
Lossy compression involves the loss of some information. When you reconstruct the information from the compressed data, you get something close to but not exactly the same as the original. However, this loss isn't where all of the compression comes from--most lossy compression algorithms include some lossless part. You can get more compression by allowing the algorithm to lose more information. Lossy compression algorithms usually have an adjustable parameter to control this compression vs. quality tradeoff.
Lossy compression is very useful for images, audio signals, or other information that is perceived through our senses. It is not, however, useful for text, numerical data, etc. where there is a precise meaning for the data.
Even for images or other perceived signals, lossless compression is sometimes required, particularly for legal documents, medical images, etc. Imagine the field day a lawyer would have when he finds out that a patient was misdiagnosed from a "lossy"-compressed image. It doesn't seem to matter that the lossy version may be perceptually indistinquishable from the lessless version--the very term "lossy" sends the doctors and lawyers scurrying.
Information is, simply put, unexpected information. There is significant data content between, "It's 10:00 and all's well," and "Fire!" The longer message is expected, so we don't really think much about it--hence, little information. The shorter message is unexpected, and thus has very high information content. One should basically judge the information content of a message by the difference in the way one would act if they heard the message and if they didn't.
Consider another example: if I dropped a letter from a word so that you only got "hel?o", you could probably guess that the letter "l" was missing. Hence, the letter "l" in this context has little information. However, if I gave you with word "?ook", you might equally guess "l", "b", "h", "c", and so on. In this case, the letter "l" would have significant information content. The very same symbol can have different information content depending on the context.
Redundancy is the difference between the shortest way one could convey a piece of information (i.e., the information content itself) and the data used to represent it. In the previous example, the first "l" is highly redundant while the second "l" is not.
All image compression algorithms attack one or more of these kinds of redundancy.
The key to reducing coding redundancy is to assign bits according to the information content. The more information, the more bits; the less information, the fewer bits.
If we try to eliminate dcoding redundancy in textual information, we can measure the frequency with which each letter occurs. If it occurs more frequently, encode it with as few bits as possible. If it doesn't occur much at all, use more bits. Thus, we might encode "E" with a few bits and "X" or "Q" with lots of them. Who cares if "X" takes 12, 20, or even 50 bits--it doesn't occur that often anyway.
Likewise, we can use such variable-length coding for pixels or other information for images. For example, if an image is 50 and 50 black pixels? Why not use one bit to signal if the pixel is black or not, and then follow the non-black ones by the actual grey-level value (8 bits). This would use one bit for half the image and nine bits for the other half. This would use 4.5 bits per pixel instead of eight.
The best known and most efficient method for variable-length encoding is Huffman coding. Pure Huffman coding is computationally expensive and requires measuring the full statistics of the data. Typically, a modified form is used that is based on presumed frequency distributions and not-quite-perfect encoding.
An RLE image is thus encoded as a sequence of (count,value) pairs. Typically, there is fixed size for the count value, such as one byte. In this case, a run of 500 1s would be (256,1),(244,1). The size of the count value is a tradeoff: if too small, you end up encoding long runs with several short pieces instead of one, if too large, there's wasted space used for short runs.
There are some perverse cases where RLE can actually cause negative compression--cases where the compressed data, because of overhead, takes more space than the original. A pattern of alternating black and white pixels, for example, would be encoded as (1,1),(1,0),(1,1),(1,0), etc. Instead of one bit per pixel, we're using one bit plus the length of the count field!
RLE is usually used for binary images, and is commonly used for FAX transmission encoding or other scanned paper documents. These use not just runs across one row, but also use redundancy across the rows.
RLE can also be used for grey-level images, though. Typically regular RLE won't work for noisy grey-level images because the value changes from pixel to pixel. However, if we look at the bit planes of the grey-level values, we see that there are large runs where the most-significant bit is 1 (lighter regions) and others where it is 0 (darker regions). It doesn't make sense to use this all the way down to the lease-significant bit, but is useful for the most-significant bits when combined with regular encoding of the least significant bits.
This works well, except when the difference is large. A phenomenon called slope overload can occur where the change is greater than the maximum difference that can be encoded. This causes sharp changes to become more gradual as the difference encoding races to catch up. A good example of this can be seen in Fig. 6.22 of your text.
We can thus use interframe encoding methods to reduce this redundancy. Such difference encoding is very common for motion-picture sequences. A problem can some when there is a large change between frames (such as a sudden explosion, change in lighting, etc.). In this case, simply encode the frame as you normally would an image (without interframe encoding), and proceed from there.
Examples of this include
By transforming our image into spaces that correspond to these properties, we can then better reduce the number of bits we devote to them.
For example, by transforming to YIQ color spaces, we separate intensity from color and can correctly apportion the bits. Likewise, by transforming to frequency-based spaces (such the frequency domains of the Fourier Transform or DCT), we can distribute our precious bits according to spatial frequency.
Energy compaction is a term often used to describe the ability of a particular transform to separate the information content and compact it into one part of the space. As we discussed earlier, the Hotelling or Karhunen-Loeve transform has optimal energy compaction, DCT has very good compaction, the Fourier Transform was good compaction, etc.
The Joint Photographic Experts Group (JPEG) has defined a standard for compressing color images that uses several distinct steps:
blocks.
Whew! It sounds like a compression algorithm designed by a committee, but by using all of these methods to attack redundancy, the JPEG standard produces very good compression.
The lossy part of the algorithm occurs primarily in the quantization stage. JPEG has a "quality" parameter tha controls the coarseness of this quantization and hence the compression rate (and, inversely, the quantity of the compression).
A motion-picture form of JPEG (called MPEG) adds interframe differencing to the mix.
© Bryan S. Morse, 1995