SOME LIMITS ON HUMAN PERCEPTION

Matthew Ward, WPI CS Department

ABSTRACT

How many distinct line lengths and orientations can humans accurately perceive? How many different sound pitches or volumes can we distinguish without error? What is our "channel capacity" when dealing with color, taste, smell, or any other of our senses? How are humans capable of recognizing hundreds of faces and thousands of spoken words? These and related issues are important in the study of both computer vision and scientific visualization. On the one hand, attempting to identify the limits of human perception can lead to insights into the design of image understanding systems. On the other hand, when designing a visualization it is important to factor in these limitations to avoid generating images with ambiguous or misleading information. This talk will present an overview of some very early work on perceptual psychology and relate it to current work in image science.

PRIMARY REFERENCE: George A. Miller, "The Magic Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information", Psychological Review, Vol. 63, No. 2, 1956.

Below are the overheads from my presentation.

RELATIONSHIPS BETWEEN PERCEPTUAL PSYCHOLOGY AND IMAGE SCIENCE

Designing Visualizations

Computer Vision

HUMAN PERCEPTION AND INFORMATION THEORY

Assume human is communication channel, taking input and generating output, with the overlap being the amount of transmitted information

For each primitive (visual, auditory, taste...) measure the number of distinct levels that the average participant can identify with a high degree of accuracy

Amount of information will follow asymptotic behavior

Label this level the "channel capacity" for information transfer by the human and measure in bits

Ignore results from "specialists" and limit training

Don't include noisy data or context (for now)

ABSOLUTE JUDGEMENT OF 1-D STIMULI

  1. Sound Pitches (Pollack): equal logarithmic steps from 100 - 8000 cps. Levels off at 2.5 bits (we can choose 6 pitches which listener will never confuse). Varying range didn't change results appreciably. Persons recognizing 5 high pitches or 5 low pitches didn't recognize 10 when combined.
  2. Sound Loudness (Gardner): varying spacing between 15 - 110 dbs. Levels off at 2.3 bits (5 levels).
  3. Salinity (Beebe-Center): varying concentration from .3 to 34.7 gm NcCl per 100 cc water. Levels off at 1.9 bits (4 levels).
  4. Position on a Line (Hake/Gardner): pointer at arbitrary position between 2 markers. Participants labeled either from a list of possibilities or number between 0 and 100. Levels off at 3.25 bits, though improves some for long exposure (10 - 15 levels).
  5. Sizes of Squares (Eriksen/Hake): 2.2 bits
  6. Color (Eriksen): 3.1 bits for hue, 2.3 bits for brightness
  7. Touch (Gelard): placing vibrators on the chest area. Levels off at 4 intensities, 5 durations, and 7 locations.
  8. Line geometry (Pollack): line length was 2.6 - 3 bits (depends on duration), direction was 2.8 - 3.3 bits, curvature was 2.2 bits for constant arc length, 1.6 bits for constant chord length.

Summary: Appears to be some built-in limit on our capability to perceive 1-D signals. Mean is 2.6 bits, standard deviation is .6 bits.

ABSOLUTE JUDGEMENT OF MULTIDIMENSIONAL STIMULI

  1. Dot in a Square (Klemmer/Frick): Should be twice that of position on a line (6.5 bits), but measured at 4.6 bits.
  2. Salinity and Sweetness (Beebe-Center): Combined sucrose and salt solutions. Should be twice salinity (3.8), but measured at 2.3.
  3. Loudness and Pitch (Pollack): Should be combination of pitch and loudness (4.8), but measured at 3.1.
  4. Hue and Saturation (Halsey/Chapanis): Should be 5.3, but measured at 3.6.
  5. Size, Brightness, and Hue (Eriksen): Should be 7.6, but measured at 4.1.
  6. Multiple Sound Parameters (Pollack/Ficks): 6 variables - frequency, intensity, rate of interruption, on-time fraction, duration, and location. Each could have 5 values for a total of 15.6K. Results were 7.2 bits, or 150 different categories.

Summary: having a little info about a lot of parameters seems to be the way we do things. This agrees with linguistic theory, which identifies 8 to 10 dimensions where each distinction is binary or ternary.

MEASUREMENT VS. DETECTION

Source: William S. Cleveland, The Elements of Graphing Data, "Graphical Perception," Wadsworth, Inc, 1985.

Differentiates distance measurement (absolute) from detection (relative)

Distinguishes between 10 graphical perception tasks

Weber's Law: likelihood of detection is proportional to the relative change, not the absolute change, of a graphical attribute

Stevens' Law: perceived scale in absolute measurements is the actual scale raised to a power. For linear features power is between .9 and 1.1, for area features it is between .6 and .9, and for volume features it is between .5 and .8.

Experiments showed errors in perception ordered as follows (increasing error)

  1. Position along a common scale
  2. Position along identical, non-aligned scales
  3. Length
  4. Angle/Slope (though error depends greatly on orientation and type)
  5. Area
  6. Volume
  7. Color Hue, Saturation, Density (only informal testing)

THE ROLE OF FOCUS AND EXPECTATION

(Chapman): in images with multiple attributes but with observers only reporting on one, prior notification of focus resulted in significantly better results than post selection of focus. (Obvious, but important, indicating that people do better when focusing on a single attribute).

EXPANDING CAPABILITIES

THE RELATIONSHIP(?) TO IMMEDIATE MEMORY

Studies show the span of immediate memory is approximately 7 items. Is this related to the span of absolute judgement?

NO. Absolute judgement is limited by amount of information, while immediate memory is limited by the number of items, no matter how complex (author distinguishes bits of information from chunks of information).

Several experiments dealing with binary digits, decimal digits, letters, syllables, words, and mixtures have shown the number of chunks is relatively constant.

Interesting observation: we can remember 6 or so multisyllabic words, but also 6 or so monosyllabic words. Thus we "chunk" things at the largest logical unit (probably).

THE ROLE OF RECODING

Recoding is the process of reorganizing information into fewer chunks with more bits of information per chunk (e.g. the process of learning Morse code). This is a form of compilation in AI jargon. Experiments in recalling long strings of binary digits shows nearly linear improvement with chunk size. We remember events by creating a verbal recoding of the event, and then elaborate off of this coded version (accounts for variations in witness testimony).

SUMMARY OF PERCEPTUAL EXPERIMENTS

IMPLICATIONS IN COMPUTER VISION

For research which attempts to parallel human vision, the process of perceptual organization (chunking) of primitive components is a promising path,

HOWEVER

basing work on highly accurate quantitative values doesn't fit into the mold of human perception, as our ability to distinguish different absolute levels is restricted to 4 - 7 values

IMPLICATIONS IN VISUALIZATION

In applications where absolute judgement is required, the best we can do with a single graphical attribute is between 4 and 7 values. To get a larger range of recognizable levels, we must repose the problem in multiple dimensions, do a sequence of simple decisions, or perform some type of chunking.

Alternatively, we could redefine the problem in a way that relative rather than absolute judgement could be used to focus attention, with a second, more quantitatively accurate, stage following the initial focus of attention.

[Return to CS563 '95 talks list]

matt@owl.WPI.EDU