One of the key areas for building any sophisticated graphics program is the ability to import and export graphics using standard file formats. This discusses three popular graphics file formats and two common compression schemes for bitmapped data. The three file formats are the Microsoft Windows Bitmap format (BMP), the Adobe Encapsulated PostScript format (EPS), and the CompuServe Graphics Interchange Format (GIF). The compression schemes that are presented are Run-Length Encoding (RLE) and Limpel-Ziv-Welsh Compression (LZW).
Note that this document contains a discussion of the compression schemes and file format structures, not necessarily the full information needed for an implementation (although for some that information is present). The references at the end provide information on where to obtain the full specification for each. However, this provides overviews and explanations that are not present in the full specifications.
Run-length encoding is a very simple, intuitive, and fairly effective compression scheme for bitmapped graphics. Its main concept is to detect repeating pixels on each row of an image and output them as a pixel count and pixel value pair, instead of outputting the repeated pixels individually. RLE encoding does not do well for stipple patterns or scanned images, which do not have repeating pixels in rows - the encoding for these types of images may actually be larger after RLE encoding. Despite this limitation, RLE is very good for other types of images, and is supported by the BMP, TIFF, and Apple MacPaint file formats, as well as many others.
When doing run-length encoding, all of the original data is scanned in rows and converted into count byte - data group pairs. There are two types of groupings that may be done. The first is a run, which is a group of repeating pixel values. This is encoded as a count of pixels followed by the pixel value. The second type of grouping is a sequence, which is a group of non-repeating bytes. This is encoded as a count of pixels followed by the pixel values. All runs and sequences may not extend past the end of a scan line to make the logic simpler.
To distinguish between runs and sequences, a run of n bytes is given the count 1 - n, while a sequence of j bytes is given the count j - 1. The example below shows the RLE encoding for a series of bytes that has five occurrences of the byte A and then the sequence BCDE. The run is encoded as the count -4 followed by the value A, and the sequence is encoded as the count 3 followed by the values BCDE.
Original: AAAAABCDE Encoded: -4A3BCDE
C code to do RLE decoding from a file appears below:
for (y = 0; y < HEIGHT; y++) { int x = 0; while ( x < WIDTH ) { int count = fgetc (infile); if (count < 0 ) /* Process a run */ { int i = 1 - count; char c = fgetc (infile); while ( i-- ) image[x++][y] = c; } else /* Process a sequence */ { int i = count + 1; while ( i-- ) image[x++][y] = fgetc (infile); } } }
RLE encoding is more complex then decoding, as is common for many compression schemes. This is because any input stream can be encoded in many different ways, and it is often difficult to determine the optimal encoding. Marv Luse, the author of my main source for this document, argues that the compression gains usually aren't worth writing a fully optimal encoder, since it is much more work and much slower. Instead he argues for using a fairly intelligent but simple encoder.
LZW compression is very popular due to its use in the GIF file format and the high compression it obtains on virtually all images. It was first published in 1977 by J. Ziv and A. Limpel, and was later refined by T. Welsh. After many years of being freely used, UniSys Corporation notified CompuServe and others that they own a patent on it. UniSys has since decided that they will allow it to be freely used for non-commercial purposes, but commercial users must obtain a license from them. LZW compression is used by the GIF file format and is one of the supported compression schemes for TIFF file format images.
The goal of LZW compression is to replace repeating input strings with n-bit codes. This is done by generating a string table on the fly, which is a mapping between pixel values and compression codes. This string table is built by the encoder as it processes the data, and due to the encoding method the decoder can reconstruct the string table as it processes the compressed data. This differs from other compression algorithms, such as Huffman coding, where the lookup table needs to be included with the compressed data.
LZW works based on the fact that many groupings of pixels are common in images: it goes through the image data and tries to encode as large a grouping of pixels as possible with an encoding from the string table, placing unrecognized groupings into the string table so they can be compressed on later occurrences. For an image with n-bit pixel values, it uses compression codes that are n + 1 bits or larger. While a smaller compression code helps gain larger amounts of compression, the size of the compression code limits the size of the string table. For example, a common arrangement is 12 bit compression codes for 8 bit per pixel images, which allows for 4096 entries in the string table. If the encoder runs out of space in the string table, the traditional LZW encoder must abort and try again with a larger compression code size. (The GIF version of LZW compression is extended to deal with these situations.)
At the start of compressing or decompressing, the string table is initialized to contain all the possible single pixel values in the image (called the roots). Once this is done, the encoder loads pixel values from the source image until it has accumulated a string that is not in the string table. At that point it outputs the compression code for the string minus the newly added pixel value, and places the entire string in the string table so it can be recognized later. It continues this process until the entire image has been processed. Below is some C-like pseudo-code for a LZW encoder:
/* Put the roots in the string table */ for (int i = 0; i < MAXPIXEL; i++) put i in the string table as entry i /* Encode the data */ char *prefix = NULL; char *string; while ( have more input ) { char c = fgetc (input); string = append c to prefix; if ( string is in string table ) { prefix = string; } else { add string to string table; output code for the current prefix; prefix = string with only c; } } output code for the current prefix;
The GIF file format extends the basic LZW algorithm to add two additional roots to the string table: a clear code and an end of input code. The clear code is used when the string table is full, and causes the string table to be re-initialized to contain only the roots. The end of input code is used as a convenience for the decompressor, as it simplifies the logic.
The BMP file format is the Microsoft Windows Bitmap format for Device Independent Bitmaps (DIBs). It may contain images with 1, 4, 8, or 24 bits per pixel, and is stored by scan line, bottom to top (which causes much aggravation, as almost all graphics formats store data from the top to bottom). There are two incompatible versions of this format, one introduced with OS/2 version 1.x and another introduced with Windows 3.0. Since the Windows 3.0 format is newer and more common, it is the one described here. The BMP format supports RLE for 4 and 8 bit per pixel images, but this facility is extremely rarely used, not even by Microsoft. The entire file is padded to be evenly divisible by 4 bytes for optimal file access. The byte ordering is always least significant bit first (Intel ordering).
The BMP format is very popular on Intel based personal computers due to being supported by Microsoft applications and being optimized for access by Intel processors. Since it is fairly simple and BMP readers are available on other systems as well, it is a fairly good choice for new applications.
The BMP file header is used to indicate the file format and basic information. Its members appear below:
unsigned short type = 'BM' (ASCII), indicates it is a BMP file unsigned long, file size in bytes unsigned short, reserved unsigned short, reserved unsigned long, offset in bytes to the image data
The BMP information header is used to contain all of the important image metrics. Its members appear below:
unsigned long, size of the header in bytes long width, width of the image in pixels long height, height of the image in pixels unsigned short, the number of image planes, currently always 1 unsigned short, the number of bits for each image pixel unsigned long, a code to indicate the compression type unsigned long, the size of the image in bytes long, the number of x pixels per meter long, the number of y pixels per meter unsigned long, the number of colors used unsigned long, the number of colors considered important
BMP palette entries are specified by four unsigned characters to specify the blue, green, and red components, in that order. The fourth byte is reserved. The odd ordering of the RGB specification and the use of a reserved byte allows loading the entries into an integer in the proper order on Intel processors. There is a palette entry for each color in the image, with the important colors, if any, first.
A complete BMP file format contains the file header followed by the information header, as described above. Next are the palette entries, with the image data last. Remember that the file is padded to fall on a 4 byte boundary.
The EPS file format is Adobe's Encapsulated PostScript format. To understand the EPS file format, some background on PostScript is necessary. PostScript is a file format used by virtually all high-end printers, and many low to moderate end printers as well. PostScript is designed to describe entire documents, including the text, object graphics (curves, lines, etc.), and bitmapped graphics. Text and object graphics in PostScript files are scaled to the full resolution of the output device, which helps account for the popularity of PostScript. PostScript is extremely complex and sophisticated, as it describes a full stack-based programming language. For example, it supports math, subroutines, variables, comments, and many other common programming language concepts. It is even possible to write a full ray-tracing program in PostScript, so that the program and its data can be downloaded to a laser printer to be executed and output!
A PostScript file consists of human-readable text to describe the document. For example, the occurrence of "2 3 add" instructs the processor to push 2 on the stack, push 3 on the stack, and then do an add. The add operator pops its two operands from the stack, does the addition, and pushes the result back on the stack. Thus adding three numbers can be done as "1 2 3 add add" or "1 2 add 3 add" with the same result left on the stack. Several books are available to describe the PostScript language if you would like more information. These are listed at the end of this document.
Due to the complexity of PostScript, the PostScript Document Structuring Conventions (DSC) were introduced to help define the ordering of tasks within a PostScript file. DSC conforming files also contain comments to indicate structuring and additional information, such as the document title, the creator, and bounding boxes. However, PostScript interpretation is still very difficult. Most applications provide PostScript output, which is relatively easy to implement, to allow high-resolution printing on most printers. Very few interpret PostScript.
Now back to EPS files. EPS files are just PostScript files that follow the DSC guidelines, and may only contain the description of a single page. PostScript has no facilities for compressing bitmapped data, which along with the complexity of PostScript makes the EPS file format a poor choice for bitmapped data storage. However, since it is relatively easy to place an EPS file into a full PostScript document, EPS is usually the best choice for exporting graphics to a document layout application. EPS output is particularly easy to add to applications that already support PostScript output for printing.
EPS files may contain a low-resolution thumbnail preview in a standard image file format, such as a TIFF on a UNIX machine, a Windows Metafile or BMP on a Intel 80x86 PC, or a PICT on a Macintosh computer. Many programs that claim to import EPS files really just interpret this low-resolution thumbnail, which is usually much less then satisfying.
Since EPS and PostScript files have a free-form format with very complex operations, the EPS format will not be further discussed here. For more information, see the references at the end of this document.
The GIF file format was developed by CompuServe for use over a modem. This explains its high-compression, packet-oriented structure, and the interlacing of image scan lines (which allows seeing the basic GIF image after receiving a relatively small amount of the file). Since it uses LZW compression, patented by UniSys, a license is necessary for commercial users. The GIF format itself is copyrighted by CompuServe, but all that is needed to use it is to acknowledge this copyright in the printed documentation. The GIF format has two versions: the 87a format was introduced in 1987, and defines the basic structure. The 89a revision extended the format to allow specifying comments and interaction guidelines. When writing GIF files that do not need the 89a extensions, using the 87a format identifier is recommended. GIF files support 8 bits per pixel or less.
Despite the limitation on the number of bits per pixel and its original design for modem transmission, the high compression and the complete header information has made the GIF format extremely popular for raster images. Thus it is an extremely good choice for raster graphics applications that wish to import and/or export image data. The May 1995 MacWorld magazine issue reported that CompuServe is working on a "new standard" to allow 24 bit images and public compression schemes, which may mean either more extensions or a totally new format.
The GIF file structure is fairly complex, as it may contain any number of images and any number of extensions. The basic composition is outlined below:
File Header, either "GIF87a" or "GIF89a" in ASCII Logical Screen Descriptor Global Color Table (optional) Block Definitions File Trailer, ";" in ASCII
The Logical Screen Descriptor section starts with the width and height of the physical display the images were created on. Next is a byte that contains whether there is a global color table, how many bits per pixel there are in the image(s), whether the color table is sorted by importance, and the number of entries in the color table. Next is the index of the background color and a specification of the aspect ratio.
The Global Color Table is only present if the Logical Screen Descriptor contains a flag to indicate its presence. It is usually present, however, and contains a series of RGB triplets that are encoded as one byte for each of the three components. If sorted by importance, more important entries first. Each image in a GIF file may contain their own local color table. For each image that does the global color table is ignored.
The Block Definitions section contains any number of images and 89a extensions, in any order. The type of block (image or extension) can be determined by the first byte in the block. Image blocks have the following data:
Logical Image Descriptor Local Color Table (optional) Minimum LZW Code Size Image data Block Terminator (a byte containing 0x00 hexadecimal)
Logical Image Descriptors start with an ASCII ',' character, followed by the left and top edge of the image on the logical screen, in pixels. Next is the width and height of the image, in pixels. The last part of each descriptor is a byte to indicate whether the image has a Local Color Table, whether the color table is sorted by importance, and the number of entries in the color table.
The Local Color Table is present only if the appropriate flag in the Logical Image Descriptor is present. Its structure is identical to the Global Color Table.
The Minimum LZW Code size is the starting code size for the image. The maximum code size for GIF files is 12 bits, and the initial code size is 3 for 1 bit per pixel images and usually the number of bits per pixel plus 1 for other images.
The image data consists of a byte to indicate the size of the image data in the block (1 - 255 bytes) followed by the LZW compressed image data. There may be any number of size bytes followed by image bytes in the image data section.
For Extension Blocks, the following data is present:
Extension Introducer (a byte containing 0x21 hexadecimal) Extension Type (a byte with the extension number) Extension data Block Terminator (a byte containing 0x00 hexadecimal)
The extension data is in the same format as image data, with a byte indicating the size followed by the corresponding data. Again, there may be any number of size bytes followed by data bytes. The interpretation of the data depends on the extension type. There are currently four extensions: one to specify comments, one for text annotations, one to store application-specific information, and the last to specify how to switch between the images in a GIF file.
For writing an application that uses 8 bit color or less, GIF is an excellent choice of file formats, since it is supported by many applications on many platforms. The BMP is also a fairly good choice, although its support is more limited for non-Intel based platforms. Finally, sophisticated programs should support the ability of PostScript printing and EPS file output.
For applications that wish to use their own file format, the RLE and LZW compression schemes are good algorithms to use to reduce data storage demands. RLE is simple to implement and free, while LZW compression usually performs much better but requires a license for commercial use.
Most of the above information was obtained from "Bitmapped Graphics Programming in C++" by Marv Luse. Besides containing an excellent description of the above compression schemes and file formats, plus many more, it also contains an excellent discussion of dithering and color management. Finally, and possibly most important for many people, it includes full C++ code to implement reading and writing each file format and compression scheme. His C++ is written fairly simply, which makes converting it for use in C programs relatively simple.
The official specification for the GIF file format can be obtained at: ftp://ftp.hawaii.edu/mirrors/info-mac/info/dev/gif-format-gif89a.txt
If you wish to include PostScript and/or EPS support in your application, you will need the official PostScript manual, the "PostScript Language Reference Manual" by Adobe Systems Incorporated. Some UNIX workstations come with PostScript manuals or on-line versions of them.
meyerd@cs.WPI.EDU