Hardware Acceleration for Window Systems

a summary by Markus Altmann
based on the paper of D. Roden & C. Wilcox

Introduction

Window systems are workstation-user interfaces for creating & manipulating windows, corresponding to the screen area of a process. It provides the user with simultaneous access to multiple processes on the workstation and enables him to manage more tasks at a time, increasing productivity. Since each process associated with a window, views the workstation resources (CPU, memory, graphics HW) as if it were its sole owner, the resources must be shared with no conflicts, to finally preserve interactivity.

Sharing the graphics HW to provide a virtual graphics device to each process requesting graphics, requires extensive HW support within the graphics system. The problem with the virtual implementation results from the pipeline architecture of the graphics devices, causing pipeline latency. This results from pipeline flushing & resynchronization for context switching/swapping when sharing a graphics accelerator. These are costly operations and impact the responsiveness and user interactivity of the graphics system. The problem increases with more complex primitives, e.g. splines, in the pipeline, causing more processing time and increases the pipeline length; pipeline flushing is then even more expensive.

Therefore, an alternative to pipeline flushing is necessary. Since SW solutions are not fast enough, HW solutions are required to provide interactivity with a significant (but limited) number of windows, especially those requiring the graphics accelerator. The graphics devices must be shared in an effective manner and the influence of graphic operations within one window on others should be minimal. Moreover, the need for pipeline flushing and resynchronization must be eliminated. To support the window system, the following HW components are integrated at correct locations in the pipeline, to minimize the interaction between the window system SW and graphics rendering (Fig. 1).

Host - Transform Processor - Window Circuitry - Framebuffer - Video Generator - Display
Fig. 1: Scheme of the HW components of the window acceleration system.

The Transformation Processor

The transform processor block is implemented with parallel floating point processors and is the core of the accelerated graphics system. These processors perform tasks such as graphics context management (HW state information, e.g. matrix stack, light sources, etc. ), matrix transformation calculations etc. and include the following HW resources for acceleration purposes.

Multiple Graphics Contexts

The graphics contexts are stored in HW, eliminating the constant context passing between host and graphics device, allowing efficient context switching without pipeline flushing and simultaneous rendering in multiple windows. The number of contexts to store is limited by memory size associated with the transform processor. When all HW contexts are allocated, software context switching can swap out the least recently used context.

Pipeline Synchronization

Pipeline synchronization assures the requested order of rendered primitives from competing processes. The synchronization HW divides up primitives and hands them to individual transform processors. When this happens, a sequence marker is stored in the sequence manager circuit, to maintain the order of the primitives, so order dependent results will be consistent & repeatable.

Pipeline Bypass

The pipeline bypass is a separate path for window primitives (buttons, window frame, etc.) not requiring the pipeline to allow the window system direct access to the framebuffer. The bypass supports block move/read/write operations in HW (see later in Window Circuitry) and can be used for direct rendering of window system primitives. It avoids overhead of the graphics pipeline, while providing relatively simple operations of the window system, which finally results in good window system interaction, even in the middle of complex rendering operations.

Pipeline Valve

Since the framebuffer can also be accessed by the pipeline bypass, the pipeline valve controls this access. The pipeline valve turns off the data from the rendering HW, so the window system can access the framebuffer for window manipulations without regarding the pipeline context. This does not stop the transformation processors, so primitive processing continues until the pipeline backs up. After the window manipulation and once the valve is open for the rendering HW, the primitives will be rendered at the correct screen location.

Pipeline Marker Circuitry

With this circuitry the window system SW can ensure, that all primitives of the context being swapped out are rendered before swapping. This prevents unnecessary pipeline flushing when changing contexts, which is required because of limited resources, shared between multiple processes. The pipeline marker is a register, accessed by the window system via the pipeline bypass without closing the valve. This register is updated when a marker, which is incremented and sent between context switches in pipeline, reaches the pipeline end. A table of the active contexts currently in pipeline associated with a marker is maintained. Therefore, on a context switch, the window system can determine the contexts in the pipeline by reading the register and referring to the table of contexts. If the context being swapped out is not in the pipeline, then the context switch can occur immediately, otherwise the window system has to wait for processing this context. However, it is at no time necessary to stop the pipeline or prevent processes from placing commands & data into the pipeline.

Window Circuitry

The following window HW has its logical location between the rendering HW and framebuffer, to eliminate the need for pipeline flushing.

Window Relative Rendering

This allows a graphics application to treat the window as a full screen virtual device, so it does not need to be window smart, which would reduce overall system performance by adding window offsets and clipping to window boundaries. All primitives in the pipeline are specified relative to the window origin and translated to screen relative coordinates after scan conversion. The offset for each window is maintained in a table in the window relative circuit, which can be asynchronously updated via pipeline bypass and closing the pipeline valve while moving or shuffling the windows. After the valve is opened, rendering will proceed correctly at new location, so window translations are completely transparent to applications, not knowing their window location. This happens with no loss of performance, since the window offset is performed in parallel with pipeline operations.

Window Block Mover

Block move operations support basic window primitives, e.g. raster text, icons, window move/shuffle/resize, etc., and are important for window system performance. In order to perform these operations window as well as screen relative, a register is used, with a bit for each operand, indicating the coordinate system to use. This allows offscreen objects, like raster text, to be moved into a window without offset calculation, which is done by the window relative HW. The application is not required to decide about the coordinate system, which avoids extra processing by the window system of the application. If an object is offscreen, then the operation is screen relative; if the object is displayed, it is window relative, otherwise an explicit command determines the way of operation.

Window Burst Transfer

With this feature large amount of pixel data can be moved to and from system memory for screen redraw, save/restore operations, which occur quickly with acceleration, reducing the need for optimizing fonts/bitmaps in offscreen memory. The HW ensures the transfer of the data to the right display location and allows an additional offset to transfer not aligned data in system memory without loss of performance.

Window Compare Circuitry

The compare circuit supports HW clipping of pixels to arbitrary boundaries in real time, reducing window system interaction with applications. A window id is written to the circuit for all following pixels to be drawn. Each pixel is then compared with the window id in the clipping planes of the framebuffer at the pixel position in order to clip the pixel or not. This allows efficient clipping even while complicated objects are rendered in a window environment.

Framebuffer

Beside the memory holding indices to specific colors, extra framebuffer planes are added to allow high speed window clipping and multiple display modes.

Window Clipping Planes

This clipping planes contain id's for the windows currently controlling each pixel. The window id of a pixel to be drawn is compared with the window id stored in these planes by the window compare circuitry. If both id's are equal, then the pixel is inside the window and is drawn, otherwise it is discarded and clipped.

Window Display Mode Planes

Similar to the previous planes, each window id coded in the window display mode planes is associated with a display mode (e.g. 8 bit planes, 24 bit planes, 12 bit plane double buffer) to allow different display modes in different windows at a time. According to these modes, the display processor of the video generator reformats the data in the framebuffer to display the mode for each pixel.

Window Overlay Planes

Overlay planes allow windows to coexist with rendered images by merging these planes in front of the image planes. The actual contents of the images planes are not destroyed by the overlay data, so rendering to image windows occluded by overlay windows is possible.

Window Offscreen Planes

These planes are areas of the framebuffer not displayed and good for raster fonts and texture. The objects stored in this area can be copied with block move operations from framebuffer to framebuffer, which is faster than from system memory to framebuffer.

Video Generator

The video circuitry formats framebuffer data according to the current display mode, maps it through a color lookup table (LUT) and converts the digital values to analog signals. The video generator provides the following HW components for window support.

Multiple Display Modes

This component allows each window to define its own display mode by determining the mode for each pixel of the windows according to the id stored in the display mode planes of the framebuffer. The display mode id is an index into the display mode table and also determines the color LUT and other attributes, such as blending for each pixel.

Multiple Color Maps

As indicated in the previous paragraph, the id stored in the window display mode planes is also an index into the HW color table, allowing each window to have its own color map. However, there are restrictions on types of different display modes in use at a time, because of limited storage supply for color LUTs.

Hardware Cursor Support

The HW cursor shape is never stored in the frame buffer, because the shape data is directly placed into the pixel stream before the digital/analog conversion. This avoids the synchronization with the framebuffer data as it is necessary with SW cursors when rendering images, while moving the cursor. The only data to update the HW cursor is its new location, therefore, providing good performance and interactivity, since no additional updating tasks are required as opposed to a SW cursor.

Conclusion

The presented HW support provides high performance, interactive window handling, by placing this support at correct stages of the graphics pipeline. This allows more complex primitives in the pipeline, while preserving and even improving window operation performance.

Reference:
Roden, Desi & Wilcox Chris,
"Hardware Acceleration for Window Systems",
Computer Graphics, Vol. 23, No. 3, July 1989

Markus Altmann / madwpi@cs.wpi.edu