SOME LIMITS ON HUMAN PERCEPTION
Matthew Ward, WPI CS Department
How many distinct line lengths and orientations can humans accurately perceive?
How many different sound pitches or volumes can we distinguish without error?
What is our "channel capacity" when dealing with color, taste, smell, or any
other of our senses? How are humans capable of recognizing hundreds of faces
and thousands of spoken words? These and related issues are important in
the study of both computer vision and scientific visualization. On the one
hand, attempting to identify the limits of human perception can lead to
insights into the design of image understanding systems. On the other hand,
when designing a visualization it is important to factor in these limitations
to avoid generating images with ambiguous or misleading information. This
talk will present an overview of some very early work on perceptual psychology
and relate it to current work in image science.
PRIMARY REFERENCE: George A. Miller, "The Magic Number Seven, Plus or Minus
Two: Some Limits on our Capacity for Processing Information", Psychological
Review, Vol. 63, No. 2, 1956.
Below are the overheads from my presentation.
RELATIONSHIPS BETWEEN PERCEPTUAL PSYCHOLOGY AND IMAGE SCIENCE
- How should color be used?
- What graphical entities can be accurately measured?
- How many distinct entities can be used without confusion?
- What primitives do humans detect preattentively?
- What level of accuracy do we perceive various primitives?
- How do we combine primitives to recognize complex phenomena?
HUMAN PERCEPTION AND INFORMATION THEORY
Assume human is communication channel, taking input and generating output,
with the overlap being the amount of transmitted information
For each primitive (visual, auditory, taste...) measure the number of
distinct levels that the average participant can identify with a high degree
Amount of information will follow asymptotic behavior
Label this level the "channel capacity" for information transfer by the
human and measure in bits
Ignore results from "specialists" and limit training
Don't include noisy data or context (for now)
ABSOLUTE JUDGEMENT OF 1-D STIMULI
- Sound Pitches (Pollack): equal logarithmic steps from 100 - 8000 cps.
Levels off at 2.5 bits (we can choose 6 pitches which listener will never
confuse). Varying range didn't change results appreciably. Persons
recognizing 5 high pitches or 5 low pitches didn't recognize 10 when combined.
- Sound Loudness (Gardner): varying spacing between 15 - 110 dbs. Levels
off at 2.3 bits (5 levels).
- Salinity (Beebe-Center): varying concentration from .3 to 34.7 gm NcCl per
100 cc water. Levels off at 1.9 bits (4 levels).
- Position on a Line (Hake/Gardner): pointer at arbitrary position between
2 markers. Participants labeled either from a list of possibilities or
number between 0 and 100. Levels off at 3.25 bits, though improves some for
long exposure (10 - 15 levels).
- Sizes of Squares (Eriksen/Hake): 2.2 bits
- Color (Eriksen): 3.1 bits for hue, 2.3 bits for brightness
- Touch (Gelard): placing vibrators on the chest area. Levels off at
4 intensities, 5 durations, and 7 locations.
- Line geometry (Pollack): line length was 2.6 - 3 bits (depends on duration),
direction was 2.8 - 3.3 bits, curvature was 2.2 bits for constant arc length,
1.6 bits for constant chord length.
Summary: Appears to be some built-in limit on our capability to perceive
1-D signals. Mean is 2.6 bits, standard deviation is .6 bits.
ABSOLUTE JUDGEMENT OF MULTIDIMENSIONAL STIMULI
- Dot in a Square (Klemmer/Frick): Should be twice that of position on a
line (6.5 bits), but measured at 4.6 bits.
- Salinity and Sweetness (Beebe-Center): Combined sucrose and salt solutions.
Should be twice salinity (3.8), but measured at 2.3.
- Loudness and Pitch (Pollack): Should be combination of pitch and loudness
(4.8), but measured at 3.1.
- Hue and Saturation (Halsey/Chapanis): Should be 5.3, but measured at 3.6.
- Size, Brightness, and Hue (Eriksen): Should be 7.6, but measured at 4.1.
- Multiple Sound Parameters (Pollack/Ficks): 6 variables - frequency,
intensity, rate of interruption, on-time fraction, duration, and location.
Each could have 5 values for a total of 15.6K. Results were 7.2 bits, or
150 different categories.
Summary: having a little info about a lot of parameters seems to be the way
we do things. This agrees with linguistic theory, which identifies 8 to 10
dimensions where each distinction is binary or ternary.
MEASUREMENT VS. DETECTION
Source: William S. Cleveland, The Elements of Graphing Data, "Graphical
Perception," Wadsworth, Inc, 1985.
Differentiates distance measurement (absolute) from detection (relative)
Distinguishes between 10 graphical perception tasks
- Color Hue
- Color Saturation
- Density (amount of black)
- Length (distance)
- Position along a common scale
- Position along identical, nonaligned scales
Weber's Law: likelihood of detection is proportional to the relative
change, not the absolute change, of a graphical attribute
Stevens' Law: perceived scale in absolute measurements is the actual scale
raised to a power. For linear features power is between .9 and 1.1, for
area features it is between .6 and .9, and for volume features it is
between .5 and .8.
Experiments showed errors in perception ordered as follows (increasing error)
- Position along a common scale
- Position along identical, non-aligned scales
- Angle/Slope (though error depends greatly on orientation and type)
- Color Hue, Saturation, Density (only informal testing)
THE ROLE OF FOCUS AND EXPECTATION
(Chapman): in images with multiple attributes but with observers only
reporting on one, prior notification of focus resulted in significantly
better results than post selection of focus. (Obvious, but important,
indicating that people do better when focusing on a single attribute).
- Relative judgement superior to absolute judgement.
- Increased dimensionality leads to larger bit rates. Problem: there is
likely to be a "span of perceptual dimensionality", hypothesized to be ~ 10.
- Reconfigure problem to be a sequence of different absolute judgements.
This leads to the analysis of immediate memory.
THE RELATIONSHIP(?) TO IMMEDIATE MEMORY
Studies show the span of immediate memory is approximately 7 items. Is this
related to the span of absolute judgement?
NO. Absolute judgement is limited by amount of information, while immediate
memory is limited by the number of items, no matter how complex (author
distinguishes bits of information from chunks of information).
Several experiments dealing with binary digits, decimal digits, letters,
syllables, words, and mixtures have shown the number of chunks is relatively
Interesting observation: we can remember 6 or so multisyllabic words, but
also 6 or so monosyllabic words. Thus we "chunk" things at the largest
logical unit (probably).
THE ROLE OF RECODING
Recoding is the process of reorganizing information into fewer chunks with
more bits of information per chunk (e.g. the process of learning Morse code).
This is a form of compilation in AI jargon.
Experiments in recalling long strings of binary digits shows nearly linear
improvement with chunk size.
We remember events by creating a verbal recoding of the event, and then
elaborate off of this coded version (accounts for variations in witness
SUMMARY OF PERCEPTUAL EXPERIMENTS
- Span of absolute judgement and immediate memory limit our ability to perceive
- We expand on ability by reformating into multiple dimensions or sequences of
- Relative judgement (detection) is more powerful than absolute (measured)
- The coincidence of the number 7 (plus or minus 2) is strictly coincidence
(though 7 appears with uncanny regularity in everyday life)
IMPLICATIONS IN COMPUTER VISION
For research which attempts to parallel human vision, the process of perceptual
organization (chunking) of primitive components is a promising path,
basing work on highly accurate quantitative values doesn't fit into the mold
of human perception, as our ability to distinguish different absolute levels
is restricted to 4 - 7 values
IMPLICATIONS IN VISUALIZATION
In applications where absolute judgement is required, the best we can do
with a single graphical attribute is between 4 and 7 values. To get a
larger range of recognizable levels, we must repose the problem in multiple
dimensions, do a sequence of simple decisions, or perform some type of
Alternatively, we could redefine the problem in a way that relative rather
than absolute judgement could be used to focus attention, with a second, more
quantitatively accurate, stage following the initial focus of attention.