WORCESTER POLYTECHNIC INSTITUTE
Computer Science Department

CS4341 ❏ Artificial Intelligence

Version: Mon Oct 2 14:37:56 EDT 2006

Course Contents

1 The Intelligent Computer

  • Course Information
    -- web, book, intro, projects, exams

  • The Field and the Book
    • Definition:
      -- AI is the study of ...
      • computations that make it possible to perceive, reason, and act.
      • how to make computers do things which, at the moment, people do better.
      • the design of intelligent agents.
      • how to make computers act like those in the movies!
      -- Turing Test
        -- avoid definition of intelligence
          how would you define it?
        -- system intelligent if passes test
        -- person or machine ? (Eliza)
    • Engineering goal -- solve real-world problems
    • Scientific goal -- explain various sorts of intelligence
    • How AI has changed
      -- focus on systems that act rationally
    • The Near-Term Applications
      -- e.g., routine design
      -- e.g., detect credit card fraud
    • The Long-Term Applications
      -- what is still left to do...????
      -- chess? Deep Blue
      -- space? Remote Agent and Deep Space 1
        "Remote Agent (RA) is a model-based, reusable, artificial intelligence (AI) software system that enables goal-based spacecraft commanding and robust fault recovery. RA was flight validated during an experiment onboard Deep Space 1 (DS1) between May 17 and May 21, 1999."
      -- autonomous vehicles? DARPA Grand Challenge ( Briefing Slides)
      -- Video: Winning the DARPA Grand Challenge
    • AI Sheds New Light on Traditional Questions
      -- computers provide new concepts & language
      -- computers require precision (e.g., what is "creativity"?)
      -- explore impact of technique or knowledge (add/remove)
      -- theories > computational models > implementations > results > refinements
      -- use of computers allows testing
      -- well tested methods used as tools
    • AI Helps Us to Become More Intelligent
      -- suggests new/better ways to tackle problems

  • What Intelligent Systems Can Do
    -- diagnosis, design, planning, scheduling, navigation, vision, tutoring, learning, ...
    • Help Experts to Solve Difficult Analysis Problems
    • Help Experts to Design New Devices
      -- Ulrich's function sharing problem (e.g., lamp chain/cord)
    • Learn from Examples
      -- rules from sample data, e.g., ID3
      -- data mining (KDDRG)
    • Provide Answers to English Questions
      -- Natural Language Understanding and Generation
    • AI Is Becoming Less Conspicuous, yet More Essential
      -- Airport gate allocation
      -- many embedded applications (cars, washing machines, ...)

  • Criteria for Success
    -- clear definition of task and implementable procedure for it
    -- regularities or constraints available
    -- other knowledge
    -- solves real problem
    -- provides new theory/method
    -- suggests new opportunities

2 Semantic Nets and Description Matching

  • Representations
    • Good Representations Are the Key to Good Problem Solving
      -- representation: a set of conventions about how to describe
      -- description: made using a representation
      -- Fig.2.1
      -- Farmer, Fox, Goose, Grain: node & link representation
      -- could show all states and all transitions
      -- safe states only -- constrained state space, reduces problem
      -- for searches often have Start and Goal states
      -- An aside, wrt searching:
      • what's a "state"?
      • what's an "operator"?
      • think of different tasks in different domians
      -- picking appropriate repr is key
      -- rich problems require rich descriptions
    • Good Representations Support Explicit, Constraint-Exposing Description
      • Make important objects & relations explicit & visible
      • Expose natural constraints
      • Suppress irrelevant details
      • Makes things understandable, complete, concise
      • Are fast to use
      • Can create with procedure
    • A Representation Has Four Fundamental Parts
      -- lexical (vocab.), structural (syntax), semantic (meaning), procedural (use)
    • Semantic Nets Convey Meaning
      -- nodes (denoting objects), links (denoting relationships), labels (application specific).
      -- examples: state space, game tree, decision tree, ...

  • The Describe-and-Match Method
    • describe-and-match method
      -- Fig.2.4
      -- an example of a Problem-Solving Method (PSM)
      -- specifies knowledge needed, I/O, and pattern of reasoning
      -- the method:
      • describe object using repr.
      • match description against "library"
      • if no match, failure
      • if satisfactory match, then announce.
      -- basis of Case-based Reasoning (CBR)
    • Issues
      -- how to describe?
      -- how to match?
      -- what is "satisfactory"?
      -- e.g., CBR: partial match with adaptation
    • Feature-Based Object Identification
      -- Fig.2.5
      -- example of describe-and-match
      -- describe using features
        e.g., height, width, color, # of holes, ...
      -- represent object as point in multidimensional feature space
      -- e.g., capital letters (# of lines, # of curves) (D vs. M)
      -- how to identify object?

  • The Describe-and-Match Method and Analogy Problems
    • Analogy problems: A is to B, as C is to x?
      -- describe rule of how A is to B
      -- find (C rule x)
    • Geometric Analogy Rules
      -- Fig.2.10
      -- A Rule: Describes Object Relations and Object Transformations
      -- (object relations in A) + (object relations in B) + (A to B transformation)
      -- Note: use labelled links of different types (relation, transform)
    • Scoring Mechanisms Rank Answers
      -- Fig.2.12
      -- Match: how to measure similarity of two rules?
      -- general problem of matching two representations
      -- Fig.2.13
      -- can weight relations differently (more or less essential to match)
      -- weights are often a problem (lack of knowledge)
    • Ambiguity Complicates Matching
      -- multiple ways to describe how "A is to B"
      -- ambiguity is often a problem

  • The Describe-and-Match Method and Recognition of Abstractions
    • Abstraction over representation
      -- Fig.2.22
      -- Fig.2.23
      -- abstracting sub-nets into nodes
      -- keep relationships between nodes
      -- enable more general results
      -- make patterns explicit, give new insight.
      -- levels of representation (detail)
      • helps with match?
      -- can also have pov in repr (electrical, heat)

  • Problem Solving and Understanding Knowledge
    -- ask questions about knowledge
    • What kind of knowledge is involved?
      -- objects? processes?
      -- "An ontology is a specification of a conceptualization."
    • How should it be represented?
      -- logic? semantic nets? rules? frames? ...
    • How much knowledge required?
    • What exactly is the knowledge needed?
      -- e.g., feature space, library of known solutions, ...
    • Is it available? and from where?
    • transparent? (understandable)
    • complete? (can say all of what's needed)

3 Generate and Test, Means-Ends Analysis, and Problem Reduction

  • Project 1 intro (Diagram)

  • PSM control is important
    -- select best knowledge at best time
    -- allocate resources (cheap methods first)
    -- aim towards answer (goal)
    -- reduce space searched
    • "informed" control (avoid useless areas of space)
    • use constraints (e.g., not fox + geese)
    • use natural constraints (e.g., symmetry, redundancy)
    • use suggestions (e.g., rules of thumb)(i.e., heuristics)
    -- meta-knowledge (how useful is a piece of knowledge)
    -- methods for control
    • explicit control by procedures or PSMs
    • message passing
    • matching used for control (e.g., rules)

  • The Generate-and-Test Method
    -- must be able to generate candidate solutions
    -- must know how to test them
    -- test must be easy
    -- good if 'many' solutions in search space
    -- good if small search space
    -- goal known? goal recognized? satisficing?
    -- no inherent direction

    • Generate-and-Test Systems Often Do Identification
      -- Test conditions identify solution
      -- G&T for Design problems? for Diagnosis problems?

    • Good Generators
      -- Complete -- eventually cover all possibilities
      -- Nonredundant -- dont waste time
      -- Informed -- propose only sensible possibilities.
      -- sometimes preceeded by Plan step (constrain G)
      -- could use feedback to affect G

  • The Means-Ends Analysis Method
    • Key Idea Is to Reduce Differences
      -- difference between current and goal state
      -- use difference description to select procedure/operator
      -- difference table -- links differences to operators
      -- what to do if choice of operators?
      -- direction: work from current (forward)?
      -- direction: work from goal (backwards)? *
      -- these produce easier problem
      -- direction: work in the middle??
      -- produces easier problems
      -- recursive

  • The Problem-Reduction Method
    • Moving Blocks Illustrates Problem Reduction
      -- convert goals into subgoals
      -- forms goal tree
      -- what's a "goal"?
      -- Fig.3.6
      -- PUT-ON uses GET-SPACE & GRASP & MOVE & UNGRASP
      -- Fig.3.8
      -- subgoals related by AND
      -- subgoals are ordered
      -- PUT-ON may indirectly use PUT-ON -- Recursive
      -- in general may be AND-OR tree (choice of decomp)
      -- subgoal dependencies?

    • Example from Design problem (diagrams)
      -- hierarchical heuristic top-down problem decomposition

    • Goal Trees Enable Introspective Question Answering
      -- How questions? look down
      -- Why questions? look up
      -- provides explanations --why good?

    • Problem-Solving Methods Often Work Together
      -- different subproblems might need different PSMs
      -- key issue is to identify and characterize PSMs

4 Nets and Basic Search

  • Blind Methods
    -- Fig.4.1
    -- can search for goal or path
    -- can search using path use costs (e.g., distances) or not
    -- for any solution or best solution
    -- with knowledge to guide, or not
    -- if not, known as uninformed, weak, or blind

    • Net Search Is Really Tree Search
      -- Fig.4.2
      -- root to leaf
      -- tree not known initially, gets generated
      -- node "expansion" makes a tree
      -- unexpanded nodes are "open" (else "closed")
      -- node has branching factor b

    • Search Trees Explode Exponentially
      -- tree has depth d
      -- number of paths is bd
      -- number "explodes exponentially" with tree depth.
      • size of space = 1 + b + b2 + b3 + ... + bd
      • for d=13, if b=2, size= 16,383
      • for d=13, if b=2.5, size= 248,352
      • for d=13, if b=3, size= 2,391,484!

    • The Menu of Search strategies
      • Depth 1st -- keep to same path going deeper (blind)
      • Breadth 1st -- all nodes at current level, then next level (blind)

      • Hill Climbing -- like depth 1st but explore most gain 1st
      • Beam -- like breadth 1st but prune unpromising children
      • Best 1st -- expand best open node 1st, regardless of depth
      • Branch-and-bound -- expand least-cost-so-far node, bound at goal
      • A* -- like B-and-b but with heuristic info.

    • Depth-First Search Dives into the Search Tree
      -- keep heading down same path
      -- commitment
      -- optimistic
      -- if no goal reached then "backup"
      -- backup to last choice point
      -- what space is needed?
      -- can be "depth limited"
      -- can use iterative deepening

    • Breadth-First Search Pushes Uniformly into the Search Tree
      -- check all paths of same length, then all of next length, etc.
      -- expanding wavefront
      -- what space is needed?

    • The Right Search Depends on the Tree
      -- depth-first bad for long failing paths
      -- depth-first sensitive to where goal node(s) is/are
      -- depth-first's first goal found may not be shortest path
      -- breadth-first uses a lot of space
      -- breadth-first is sensitive to branching factor
      -- breadth-first's first goal found will be shortest path

  • Heuristically Informed Methods
    -- more knowledge tends to lead to less searching
    -- what's a heuristic?
    -- e.g. distance as crow flies instead of road distance.
    -- used to determine quality of node
    -- used to determine estimate of path to goal
    -- usually only heuristic is available
    -- heuristic should direct search, but not prune (why?)
    -- consider the eight puzzle (example heuristic)

    • Quality Measurements Turn Depth-First Search into Hill Climbing
      -- quality is height of hill
      -- quality surface
      -- e.g., estimate of how close to goal
      -- climb for quality
      -- climb in direction that is most beneficial
      -- stay on same path (depth-first search)

    • Foothills, Plateaus, and Ridges Make Hills Hard to Climb
      -- Fig.4.7
      -- foothills - local maximum vs. global maximum
      -- plateaus - no slope to climb (aimless wandering)
      -- ridges - steps dont correspond to climbing slope
      -- fixes? backtrack, jump (some nondeterminism)

    • Beam Search Expands Several Partial Paths and Purges the Rest
      -- based on breadth-first
      -- use best w only nodes at each level

    • Best-First Search Expands the Best Partial Path
      -- use best of all open nodes at any level
      -- uses estimated quality of current node

    • Search May Lead to Discovery
      -- to designs
      -- quality heuristic is "interestingness"
      -- operators add or change design elements
      -- goal test needed?

5 Nets and Optimal Search

  • The Best Path
    -- search a network
    -- for goal or for path
    -- use quality measure
    -- or path cost measure
    -- usually heuristic

    • Branch-and-Bound Search Expands the Least-Cost Partial Path
      -- Fig.5.2
      -- have cost for every action taken at a node
        e.g., distance travelled
      -- expands the least-cost partial path
        i.e., best first by partial path cost
      -- bound: stops some nodes from being expanded (prune) (heuristic?)
      -- bounded if partial path cost >= goal path cost
        i.e., find goal, but keep looking

    • Adding Underestimates Improves Efficiency
      -- cost of total path through a node?
      -- e(total path length) = d(already travelled) + e(distance remaining)
      -- "e" is estimate (heuristic)
      -- "d" is distance (known)
      -- make e an underestimate
      • i.e., the real distance can't be less
      • overestimates may reach goal but not by best path
      -- expand node with lowest underestimated path
      • it predicts that the path through this node is best
      -- make e as accurate as possible
      -- if completely accurate?

  • Redundant Paths
    -- discard them
    -- keep only best S->i and best i->G
    -- called dynamic-programming principle

    • Underestimates and Dynamic Programming Improve Branch-and-Bound Search
      -- called A*
      -- very common approach
      -- u(S,G) = d(S,j) + u(j,G)
      -- u(j,G) is heuristic underestimate of cost of remaining path.
      -- can use iterative-deepening A* (i.e., IDA*)
        use increasing values of e(total path length)

    • Robot Path Planning Illustrates Search
      -- Fig.5.6
      -- robot path planning example
      -- Fig.5.7
      -- configuration space obstacles
      -- Fig.5.9
      -- make visibility graph of sight lines
      -- do A* search over graph

6 Trees and Adversarial Search

  • Algorithmic Methods
    -- Adversarial: more than 1 person trying to win (e.g., games)
    -- why study?
    • strategy
    • uncertainty
    • v. large state space
    • fixed rules (operators/actions)
    • small state descriptions

    • Nodes Represent Board Positions
      -- Fig.6.1
      -- game tree = possible future board configurations
      -- node = board config
      -- link = a possible move from one player
      -- ply = levels in tree including root level
      -- each level represents possible situations for one player

    • Exhaustive Search Is Impossible
      -- 10120 possible branches in chess game tree
      -- use a "lookahead" procedure with situation evaluation
      -- what do we need?
      • Generator: what are all possible legal moves from position
      • (Heuristic filter: all "plausible" moves)
      • Evaluator: how good a move is for current player
      • Pruning: remove losing moves (most games dont allow backup)

    • The Minimax Procedure Is a Lookahead Procedure
      -- Fig.6.2
      -- static evaluator = heuristic evaluation of board position quality
      • # of pieces
      • strength of pieces (queen > pawn)
      • mobility (poss. moves)
      • control (squares threatened)
      • threats (potential captures)
      • patterns of pieces (e.g., diagonal pawns)
      -- score - very +ve means player A wins, very -ve means player B wins
      -- maximizer - wants +ve scores (+10)
      -- minimizer - wants -ve scores (-10)
      -- assume each player will always pick move that is best for them
      -- goes to bottom of tree, evaluates
      -- back the scores up tree, "minimaxing" (minimize/maximize)
      -- pick move that avoids opponents best move(s)
      -- how far to expand tree?
      -- minimax is expensive (large trees)

    • The Alpha-Beta Procedure Prunes Game Trees
      -- Fig.6.3
      -- dont expand a node that can't provide a score that's better than what you already have

    • Alpha-Beta May Not Prune Many Branches from the Tree
      -- game tree branch order makes a difference
      -- Fig.6.6
      -- still exponential with depth
      -- time/space saved can allow deeper searches

  • Heuristic Methods
    -- may prune the path that leads to a win!
    -- i.e., into the valley to reach the hill

    • Progressive Deepening Keeps Computing Within Time Bounds
      -- depth limited search, with d = 1, 2, 3, ...
      -- Anytime algorithm

    • Heuristic Continuation Fights the Horizon Effect
      -- Fig.6.7
      -- fixed depth search produces a "horizon" (may be bad beyond it!)
      -- singular-extension -- if one move's value is much better than rest
      -- search-until quiescent -- look for quiet

    • Heuristic Pruning Also Limits Search
      -- limit tree growth
      -- tapered search
      • rank nodes children by (fast) evaluation
      • b(child) = b(parent) - rank(child)
      • where "b" is number of branches to keep

    • "Deep Thought" Plays Grandmaster Chess
      -- now "Deep Blue"
      -- see also
      -- uses alpha-beta search, with selective extensions
      -- could search to a depth of 12 ply
      -- has opening "book" and all five-or-fewer piece endgames
      -- massively parallel, 30-node, RS/6000, SP-based computer system enhanced with 480 special purpose VLSI chess chips
      -- evaluates 200,000,000 chess positions per second
      -- several months working with a grandmaster on evaluation function
      -- "In three minutes, ... it computes everything it knows about the current position from scratch."

    • How do people play Chess...?


7 Rules and Rule Chaining

  • Rule-Based Deduction Systems
    -- If-Then
    -- antecedent-consequent
    -- forward-chaining
    -- satisfied, triggered, fired
    -- working memory, rule base
    -- LHS: boolean function that tests, or pattern(s)
    -- RHS: actions, or pattern
    -- nonmonotonic vs monotonic
    • can a rule invalidate another? (order dependent)
    -- good: rules are small slices of knowledge
    -- bad: rules are small slices of knowledge

    • Post's Theorem
      -- production systems can compute all computable functions
      -- hence, if intelligence is computable, productions can produce it

    • Many Rule-Based Systems Are Deduction Systems
      -- deduction -- similar to logic
        (but dont assume truth)
      -- with assertions (assert a fact)
      -- deduction - all triggered rules can fire

    • A Toy Deduction System Identifies Animals
      -- Fig.7.2
      -- antecedent and consequent may include variables
      -- variables get bound to values
      -- (?x has-color black)

    • Deduction Systems May Run Either Forward or Backward
      -- Fig.7.3
      -- backward chaining (goal directed reasoning)
      -- make hypothesis
      -- work back through rules to supportive known facts
      -- acts as problem decomposition

    • The Problem Determines Whether Chaining Should Be Forward or Backward
      -- hard to figure out in practice
      -- combine knowledge of fan out, fan in, number of facts and number of conclusions
      -- try to minimize effort
      -- try to match human approach if that's important

    • Control Issues
      -- infinite loops
      -- when to stop
      -- goal reached?
      -- rules used more than once?

  • Rule-Based Reaction Systems
    -- condition-action rules
    -- rules may assert, or do other actions
    -- forwards, data-directed

  • A Toy Reaction System Bags Groceries
    -- note "add-delete" syntax (IF a THEN DELETE b, ADD c)
      (relates to planning systems)
    -- use subtask name in WM (e.g., step is bag-small-items)

  • Reaction Systems Require Conflict Resolution Strategies
    -- rule ordering
    -- rule groups
      e.g., rule selection by subtask
    -- conflict resolution strategies
    • specificity (LHS)
    • predefined priority
    • data recency (use recent WM elements)
    • most informative (RHS)
    • etc. etc. etc.
    -- how to pick?
  • Procedures for Forward and Backward Chaining
    • Depth-First Search Can Supply Compatible Bindings for Forward Chaining
      -- must produce new assertions only
      -- alternative binding
      -- can search tree of alternative bindings in different ways
      -- need all possibilities, or just one?

    • The Rete Approach Deploys Relational Operations Incrementally
      -- Fig.7.11
      -- for forward chaining rules
      -- don't search, pre-index rules (trade space for time)
      -- build rete for every rule
      -- "pour" given assertions through whole rete and keep bindings
      -- can then easily determine which rules are satisfied
      -- key idea -- each rule firing changes WM very little
      -- take WM change and put that through rete to see effect
      -- all other bindings stay the same


    8 Rules, Substrates, and Cognitive Modeling

    • Rule-based Systems Viewed as Substrate
      • Explanation Modules Explain Reasoning
        -- Fig.8.1 & 8.2
        -- inference net & goal tree
        -- inference net -- shows flow of reasoning
        -- goal tree -- shows problem decomposition
        -- use to get explanation
        -- "how did you show...?" -- look down goal tree
        -- "why did you use...?" -- look up goal tree

      • Reasoning Systems Can Exhibit Variable Reasoning Styles
        -- can use rule "providing assumptions"
        -- perhaps good to assume if expensive to show
        -- "providing A" = "if we assume A"
        -- "unless A" = "if we assume not A"
        -- reasoning modes (i.e., how to deal with assumptions)
        -- check all assumptions vs. ignore all assumptions
        -- not checking all assumptions makes result less reliable

      • Probability Modules Help You to Determine Answer Reliability
        -- conclusions (assertions) are rarely certain
          e.g., ... THEN they have the 'flu
        -- rules are rarely certain
          e.g., IF I see I see water on my office window THEN PROBABLY it is raining
        -- If A and B THEN C -- both A as well as B may have probability
        -- so lhs (A and B) has a probability
        -- if lhs has probability and rule has probability, so does conclusion

      • Two Key Heuristics Enable Knowledge Engineers to Acquire Knowledge
        -- knowledge engineering
        -- 1. ask about specific situations
        -- 2. distinguish between apparently similar situations

      • Acquisition Modules Assist Knowledge Transfer
        -- Fig.8.5
        -- help knowledge engineers make new rules
        -- build tree of classes of rules (by conclusion type)
        -- form 'typical" rule of each type
        -- new rules are compared to typical as a check
        -- e.g., missing antecedent?

      • Rule Interactions Can Be Troublesome
        -- new rules are rarely independent of existing ones

      • Rule-Based Systems Can Behave Like Idiot Savants
        -- don't reason at multiple levels
        -- don't use constraint-exposing models
        -- don't show task structure
        -- don't look at problems from different perspectives
        -- don't know when to break the rules
        -- don't have access to the reasoning behind the rule

    • Rule-Based Systems Viewed as Models for Human Problem Solving
      • Rule-Based Systems Can Model Some Human Problem Solving
        -- consider WM to be Short term Memory (STM)
        -- rules used to model actions on STM
        -- 7 +- 2 chunks
        -- can hypothesize rules for simple human tasks
        -- e.g., arithmetic

      • Protocol Analysis Produces Production-System Conjectures
        -- protocol collection -- talk while solving problem
        -- protocol analysis
        -- infer productions
        -- infer changes in state of knowledge
        -- form problem-behavior graph

      • SOAR Models Human Problem Solving, Maybe
        -- SOAR is an architecture
        -- an integrated collection of representations and methods
        -- Also a theory of cognition
        • Problem spaces as a single framework for all tasks and subtasks to be solved
        • Production rules as the single representation of permanent knowledge
        • Objects with attributes and values as the single representation of temporary knowledge
        • Automatic subgoaling as the single mechanism for generating goals
        • Chunking as the single learning mechanism
        -- it breaks rule-based systems into basic units of representation and action
          e.g., what to do and how to do it are different problems
        -- Impasse is a situation where SOAR doesnt know how to proceed
        -- Sub-goal created to resolve the impasse
        -- Resolved impasse triggers chunk formation
        -- Chunk is new rule that describes how impasse was dealt with
        -- Search control is encoded in production rules that create preferences for operators
        -- SOAR acts on goals, problem spaces, states, operators.
        -- SOAR does propose, compare, select, refine on each thing.

    9 Frames and Inheritance

    • Frames, Individuals, and Inheritance
      -- frames as knowledge representation
      -- frames as model of memory
      -- at level above semantic nets
      -- can represent objects, actions, relationships, ...

      • Frames Contain Slots and Slot Values
        -- Fig.9.1
        -- frames have slots
        -- slots have a value
        -- in other models of frames, slots may contain meta-knowledge, defaults, etc.

      • Frames may Describe Instances or Classes
        -- Fig.9.2
        -- instance frames represent individuals
        -- class frames represent classes
        -- "Is-a" is-a-member-of-the-class
        -- Grumpy Is-a Manager (I to C)
        -- "Ako" a-kind-of
        -- Manager Ako Competitor (C to C)
        -- allows property inheritance
        -- note: birds fly, a penguin is a bird, hence...?
        • i.e., may need to delete property (block inheritance)
        • add property (corgis have smooth coats)
        • modify property (3-legged tables)

      • Inheritance Enables When-Constructed Procedures to Move Default Slot Values from Classes to Instances
        -- instances inherit slots (and values)
        -- allows knowledge to be written in one place
        -- hence easier knowledge maintenance
        -- values in classes represent default knowledge for instances
        -- e.g., all dogs have color brown
        -- classes can have when-constructed procedures
        -- also known as "attached actions"
        -- they can be inherited

      • A Class Should Appear Before All Its Superclasses
        -- Fig.9.4
        -- if frame has >1 parent frame, not a strict hierarchy
        -- multiple inheritance (often prohibited)
        -- need procedure to decide class-precedence list
        -- many possible ways
        -- can lead to contradictions
          -- professor is an uncle therefore kind, and, ...
          -- professor is a teacher therefore not kind

    • Demon Procedures
      • When-Requested Procedures Override Slot Values
        -- can provide a value even if one isnt present
        -- can override existing value
        -- can do simple inference

      • When-Written Procedures Can Maintain Constraints
        -- captures constraints between values
        -- if new value of slot A is 10 then value of slot B must be 20

      • With-Respect-to Procedures Deal with Perspectives and Contexts
        -- size of dog from perspective of ant is large
        -- size of dog from perspective of elephant is small
        -- mood of student in context of class is grumpy
        -- mood of student in context of pizza is happy

      • Inheritance and Demons Introduce Procedural Semantics
        -- procedures add meaning to frames
        -- procedures are part of the representation

    • Frames, Events, and Inheritance
      • Digesting News Seems to Involve Frame Retrieving and Slot Filling
        -- stereotypical events have expected slots
        -- i.e., produce expectations
        -- things play fixed roles
        -- expectations help understanding
        -- how to select the right event frame?

    • Frames as a theory of memory
      -- stereotypical objects (typical & possible values)
      -- hierarchy & inheritance
        (common properties stored only once)
      -- expectations (e.g., entering a kitchen I see...)
      -- defaults
      -- inference triggered by different situations


    10 Frames and Commonsense

    • Thematic-role Frames
      -- language describes actions and change
      -- frames can be used to represent meaning

      • An Object's Thematic Role Specifies the Object's Relation to an Action
        -- Fig.10.1
        -- Verbs = actions
        -- Noun phrases = thematic roles
        -- e.g., agent, thematic object, instrument
        -- constraints introduced by verbs
        -- not all verbs allow all roles
        -- thematic roles indicated by words, e.g., with, to, by, near, before
        -- some thematic roles:
        • "Agent" responsible for action
        • "Beneficiary" is who action is for
        • "Thematic object" is what sentence is about
        • "Instrument" is used as tool in action
        • "Source/Destination" refer to physical position changes
        • "Time" is when action done
        • "Location" is where action done

      • Filled Thematic Roles Help You to Answer Questions
        -- questions tend to be about one thematic role
        -- e.g., "With what...?" (instrument)

      • Various Constraints Establish Thematic Roles
        -- NPs have thematic role ambiguity
        -- verbs constrain what roles, and where in sentence NPs go
        -- prepositions constrain NPs roles (e.g., from --> source)
        -- nouns constrain role possibilities (e.g., inanimate)

      • A Variety of Constraints Help Establish Verb Meanings
        -- verbs & VPs have meaning ambiguity (e.g., shot, saw)
        -- NP may disambiguate (e.g., shot the rabbit)
        -- Particles may disambiguate (e.g., threw away vs. threw up)

      • Constraints Enable Sentence Analysis
        -- have dictionary of info about nouns and verbs
        -- find verb
        -- find thematic object
        -- handle other noun phrases
        -- use constraints throughout

      • Examples Using Take Illustrate How Constraints Interact
        -- example using "take"
        -- transport, swindle, swallow, steal, date, remove, control

    • Expansion into Primitive Actions
      -- underlying meanings for verbs
      -- what assumptions can be made for an action
      -- e.g., how done, where done, with what, ...
      -- explain relationship between "buy" and "sell"

      • Primitive Actions Describe Many Higher-Level Actions
        -- small number of primitives to describe actions
        -- Basic English (1000 words)
        -- sample primitives:
          move-body-part     move-object
          expel     ingest
          propel     speak
          see     hear
          smell     feel
          move-possession     move-concept
          think-about     conclude
        -- how do you know you have the right set?

      • Actions Often Imply Implicit State Changes and Cause-Effect Relations
        -- Fig.10.6
        -- action Result state-change
        -- Fig.10.7
        -- action Result action

      • Actions Often Imply Subactions
        -- Fig.10.10
        -- person moving block implies body parts moving too
        -- person eating implies body part moving to move instrument

      • Primitive-Action Frames and State-Change Frames Facilitate Question Answering and Paraphrase Recognition
        -- patterns of primitives can be matched to db
        -- db of "scripts" (stereotypical actions in known situations)
        -- do two sentences have same meaning? (e.g., buy & sell)

      • CYC Captures Commonsense Knowledge
        -- what is Cyc?
        -- For fun: video presentations about Cyc
        -- For fun: HAL & Cyc


    11 Numeric Constraints and Propagation

    • Propagation of Numbers Through Numeric Constraint Nets
      • Numeric Constraint Boxes Propagate Numbers through Equations
        -- Fig.11.1
        -- equation is a constraint
        -- A=B+C given A=3, B=2, C=2 ?
        -- set of equations = set of constraints
        -- values can propagate
        -- A given A=B+C and B=2, C=2 ?
        -- Direction? A=B+C, B=A-C, C=A-B
        -- constraints connect via common variables
        -- arithmetic constraint net

    • Propagation of Probability Bounds Through Opinion Nets
      -- Fig.11.4
      -- most opinions aren't certain

      • Probability Bounds Express Uncertainty
        -- nodes are AND and OR
        -- a values is a range of probability (e.g., V = [0.25, 0.75] )
        -- l(V) is lower (0.75)
        -- u(V) is upper (0.25)
        -- OR(A, B) gives [l(A or B), u(A or B)]
          -- l(A or B) >= max[l(A), l(B)]
          -- u(A or B) <= u(A) + u(B)
        -- AND(A, B) gives [l(A and B), u(A and B)]
          -- l(A and B) >= l(A) + l(B) -1
          -- u(A and B) <= min[u(a), u(B)]

    • Propagation of Surface Altitudes Through Arrays
      -- arrays with values, may be sparse, or contain errors
      -- constraints can express local relationships in array
      -- can do smoothing of images
      -- can do filling in of sparse data
      -- e.g., fill holes with average of surroundings

      • Local Constraints Arbitrate between Smoothness Expectations and Actual Data
        -- Fig.11.9
        -- sparse data with value and with confidence
        -- relaxation formula
        -- replace current value with new value based on a combination of current value and average of neighbors: both weighted by confidence
        -- 30 data points, hence effectively 30 copies of formula
        -- actually sweep formula across data propagating new values through array
        -- repeat until stable
        -- Fig.11.10

      • Constraint Propagation Achieves Global Consistency through Local Computation
        -- most confident values affect neighbors
        -- and their neighbors, etc.
        -- 30 local constraints lead to global consistency


    12 Symbolic Constraints and Propagation

    • Propagation of Line Labels through Drawing Junctions
      -- numbers, probability intervals
      -- next... symbolic labels

      • There Are Only Four Ways to Label a Line in the Three-Faced-Vertex World
        -- Fig.12.1
        -- given a line drawing of a scene
        -- find an interpretation
        -- Fig.12.2
        -- i.e., label all lines as boundary (<)(>), convex (+), concave (-)
        -- junctions of lines = vertices of object
        -- Fig.12.4
        -- junction label (e.g., fork +++)
        -- natural constraints limit possible junction labels
        -- 208 possible junction labellings, but only 18 possible
        -- start with simplifying assumptions (e.g., no shadows)

      • There Are Only 18 Ways to Label a Three-Faced Junction
        -- Fig.12.11
        -- 208 possible junction labellings, but only 18 possible
        -- Fig.12.14
        -- e.g., only 5 forks (incl. +++, ---)

      • Finding Correct Labels Is Part of Line-Drawing Analysis
        -- Fig.12.16
        -- interior junction labellings usually ambigous (e.g., +++ vs ---)
        -- principle: exploit constraints and regularities!
        -- constraints come from:
        • regularities in the world (due to physics or people)
        • surfaces being planar
        • uniform color
        • uniform texture
        • lighting
        • continuity of edges
        • etc.

      • Waltz's Procedure Propagates Label Constraints through Junctions
        -- Fig.12.19
        -- line label constrains junctions at each end,
        -- and junctions at each end constrain line label.
        -- put set of appropriate junction labels (e.g., fork) at junction
        -- move to next junction, do same
        -- remove incompatible labels
        -- propagate changes
        -- do for all junctions, until stable
        -- Fig.12.20

      • Many Line and Junction Labels Are Needed to Handle Shadows and Cracks
        -- Fig.12.24
        -- both add more labels and more constraints

      • Illumination Increases Label Count and Tightens Constraint
        -- adds more labels and more constraints
        -- now 11 instead of just 4 line labels
        -- L vertex: 2.5 * 103 possible junctions, 80 actual

      • The Flow of Labels Can Be Dramatic
        -- visit border junctions first (less ambiguous)
        -- may take only a few visits to an internal vertex

      • The Computation Required Is Proportional to Drawing Size
        -- work is roughly linear wrt number of lines
        -- Note: obscuring objects restrict flow

    • Propagation of Time-Interval Relations
      • There Are 13 Ways to Label a Link between Interval Nodes Yielding 169 Constraints
        -- Fig.12.28
        -- 13 possible relations between 2 time intervals
        -- Fig.12.29
        -- e.g., A before B & B before C ----> A before C

    • Constraint Satisfaction Problems in general
      -- CSP: set of vbls, set of constraints
      -- each vbls has set of possible values
      -- find an assignment of values that satisfies constraints
      -- e.g., map coloring, 8 queens

      • CSP solutions by Searching
        -- solution possible via backtracking depth first search
        -- often a huge search space
        -- which variable next? pick most constrained vbl (fewest values)
        -- forward checking: look ahead one variable to see/record impact

      • CSP solutions by Constraint Propagation
        -- Arc Consistency: arc from vbl to vbl, represents binary constraint
        -- e.g., X <---different-color---> Y, with X={red, blue}, Y={red}
        -- look in both directions
        -- there exists a consistent assignment in X for all Y (blue)
        -- Y-to-X is consistent
        -- NOT there exists a consistent assignment in Y for all X
        -- X-to-Y is not consistent
        -- adjust X to produce consistency (delete red)

      • CSP solutions by Min-Conflicts heuristic
        -- e.g., for N-queens
        -- assign "reasonable" values to all variables
          (Note PSM with full initial assignment vs. incremental assignment)
        -- repeat
          -- randomly choose vbl in conflict
          -- choose new value for that vbl that minimizes conflicts


    13 Logic and Resolution Proof

    • Rules of Inference
      -- some casual definitions...
      -- Inference: deriving new stuff from what's known
      -- Deduction: producing new facts from old facts using inference rules
      -- Induction: produce general description from specific examples
      -- Abduction: producing a likely fact from an old fact and an inference rule
      -- which are truth preserving?
      -- Logic: formal language: syntax & semantics
      -- a knowledge representation associated with deduction
      -- Propositional logic (no variables)
      -- 1st order predicate calculus
      -- true, false (2 valued)

      • Logic Has a Traditional Notation
        -- predicate: function that gives true or gives false
        -- predicate(symbol) {e.g., Green(x) }
        -- symbol denotes something satisfying predicate
        -- Fig.13.1
        -- conjunction (&), disjunction (v), negation (~), implication (==>)
        -- truth table (for A==>B) (TFF)
        -- substitutions (e.g., A==>B <==> ~AvB )
        -- de Morgan's Laws
          ~(A&B) <==> (~A)v(~B)
          ~(AvB) <==> (~A)&(~B)

      • Quantifiers Determine When Expressions Are True
        -- universal quantifier: for all
        -- existential quantifier: there exists (one or more)
        -- 1st order predicate calculus (variables represent objects)
        -- 2nd order? (variables can represent predicates)

      • Logic Has a Rich Vocabulary
        -- Fig.13.2
        -- literals: P(x), ~P(x)
        -- wffs: literals, literals with v & ~ ==>
        -- wffs: with quantifiers
        -- e.g., Ax[Person(x) ==> Mortal(x)]
        -- e.g., Person(Susan) ==> Mortal(Susan)
        -- clause: literal v literal

      • Interpretations Tie Logic Symbols to Worlds
        -- Fig.13.3
        -- symbols <------------> objects (in imaginable world)
        -- predicates <------------> relations (in imaginable world)
        -- provides an interpretation

      • Proofs Tie Axioms to Consequences
        -- proof (derive true expressions: theorems)
        -- given axioms (stated as true)
        -- use sound rules of inference
        -- special
          satisfiable: expression is T for some possible interpretation of symbols
          valid: expression is T for all possible interpretation of symbols
        -- Modus Ponens: given axioms A, A==>B then B logically follows

      • Resolution Is a Sound Rule of Inference
        -- i.e., it is truth preserving
        -- given axioms AvB, ~BvC
        -- resolvent is AvC

    • Resolution Proofs
      • Resolution Proves Theorems by Refutation
        -- Fig.13.4
        -- assume negation of theorem to be shown is true
        -- show that proof attempt leads to conflict
        -- conclude that theorem must be true

      • Using Resolution Requires Axioms to Be in Clause Form
        -- e.g., ~Brick(x) v ~On(u,y) v ~On(y,u)
        -- key steps to get clause form
        • eliminate implications
        • move negations: e.g., ~Ax[P(x)] --> Ex[~P(x)]
        • eliminate existential quantifiers (Skolem functions)
        • rename variables if necessary
        • move universal quantifiers to the left
        • move disjunctions down to literals
        • eliminate conjunctions
        • rename variables
        • eliminate universal quantifiers

      • Proof Is Exponential
        -- cant guide it with MEA or A*
        -- can use control strategies that may help
        -- e.g., unit preference, set-of-support
        -- search may be exponential (i.e., long proofs are bad!)
        -- search may not terminate if there isn't a proof
        -- semidecidable: tell you only if it's a theorem

      • Resolution Requires Unification
        -- resolutions requires literals to match
        -- e.g., ~On(x, Table) with On(Block, y)
        -- substitution that makes the clauses resolve is "unifier"

      • Traditional Logic Is Monotonic
        -- theorems added, never removed
        -- number of clauses only increases
        -- compare with planning

      • Theorem Proving Is Suitable for Certain Problems, but Not for All Problems
        -- problem with long proofs
        -- some knowledge very hard to formalize
        -- some knowledge requires special logic


    14 Backtracking and Truth Maintenance

    • Chronological and Dependency-Directed Backtracking
      -- remember the depth-first search
      -- if blocked backs up to last choice point and make another
      -- Chronological Backtracking (wrt time/order choice made)

      • Limit Boxes Identify Inconsistencies
        -- "limit boxes" are constraints on values (e.g., <2 )
        -- choices propagate through equations
        -- violating limit causes conflict
        -- conflict requires backtracking
        -- examples of "conflicts"
        • blocked paths in path/route finding
        • failing constraints in CSP solving search
        • new information that says that deduced/propagated value is wrong
        • assumption shown incorrect

      • Chronological Backtracking Wastes Time
        -- chronological backtracking undoes decisions that may be OK
        -- chronological backtracking doesnt respond to cause of conflict
        -- major problem if bad decision was 1st made

      • Nonchronological Backtracking Exploits Dependencies
        -- need Nonchronological Backtracking
        -- use dependencies (choices that contributed to conflict)
        -- need clean way to keep dependency info (as "justifications")

    • Proof by Constraint Propagation
      • Truth Can Be Propagated
        -- propagate truth values through constraints on truth
        -- constraints are rules of logical operators (e.g., A==>B, TFF)
        -- literals or expressions: true, false, unknown

      • Truth Propagation Can Establish Justifications
        -- propagate truth through net, keep records
        -- justification links from new truth value to dependencies

      • Justification Links Enable Programs to Change Their Minds
        -- can make Assumptions about truth values (e.g., assume True)
        -- more useful to have a value instead of none
        -- justification for value is "assumption"
        -- assumptions may lead to conflicts
        -- assumptions may be undone by known truth values
        • could record propositions that, if found, will indicate assumption is false.
        • e.g., assume "p", record "~p" as a problem
        -- backtracking guided by justification links

      • Proof by Truth Propagation Has Limits
        -- works with propositional logic only (no variables)


    15 Planning

    • Now starting on "Applications"

    • Planning Using If-Add-Delete Operators
      -- plan: sequence of actions intended to achieve goal(s)
      -- what if we were to use logic? (monotonic)
      -- initial situation, goal situation, operators
      -- search for sequence of operators
      -- transfor initial situation into goal situation
      -- Fig.15.1
      -- consider sample problem
        initial: On(A,C)&On(D,B)
        goal: On(A,B)&On(B,C)
      -- what are possible paths?
      -- why plan?
      • can anticipate problems
      • can search in model and not in world
      • can aid in error recovery if plan doesnt work

      • Operators Specify Add Lists and Delete Lists
        -- operators
        • have preconditions (prerequisites)
        • use variables for generality (instantiate when op used)
        • have add list
        • have delete list (i.e., not monotonic)
        -- must describe everything relevant for ops to work
          e.g., Clear(A)

      • You Can Plan by Searching for a Satisfactory Sequence of Operators
        -- search can be exponential due to # of ops and possible bindings
        -- i.e., don't do linear search

      • Backward Chaining Can Reduce Effort
        -- Fig.15.3
        -- one goal, therefore try backward chaining from goal
        -- i.e., look for operator that adds all or part of goal
        -- use op preconditions as new goal (may branch)
        -- set up Establishes links
        -- get complete plan, with partial order (POP)
        -- topological sort gets linear plan
        -- trying for linear plan from scratch is flawed
          (too much detail too soon & overcommits)

      • Impossible Plans Can Be Detected
        -- try more than one goal (may be order sensitive)
        -- Fig.15.4
        -- delete of operator may interfere with precond of another (Threat)
        -- Fig.15.5
        -- check if ordering ops can remove threat (no Before loops!)
        -- Fig.15.6
        -- Fig.15.7
        -- Fig.15.8

      • Partial Instantiation Can Help Reduce Effort Too
        -- instantiate as much as necessary
        -- e.g., put block down on x
        -- in general can do for both objects and actions
        -- principle of "least commitment" (benefits?)
        -- hierarchical planning

    • Planning Using Situation Variables
      -- try to use logic (monotonic, remember?)
      -- operators take one situation to another (sequence)

      • Finding Operator Sequences Requires Situation Variables
        -- add Situation Variables
        -- On(A,B,s1) but ~On(A,B,s2)
        -- set up goal as situation
        -- use operations that represent actions
        -- operations take arguments and a situation, and ...
        --    produce new situation that makes a predicate true
        -- e.g., if x isn't on the table in s then in the new situation after using STORE it is.
        -- express all in predicate calc
        -- put all in clause form
        -- use resolution to refute ~goal
        -- Fig.15.11
        -- use Answer term

      • Frame Axioms Address the Frame Problem
        -- No! not that kind of frame! (scope)
        -- if On(A,Table,s)&On(B,Table,s), move A, where's B in new situation?
        -- need Frame Axioms ("how predicates survive operations")


    16 Learning by Analyzing Differences

      -- some say no learning ==> no intelligence
      -- what is learning?
      -- types?
      -- supervised, unsupervised, reinforcement
      -- Rote Learning, Learning from Advice, Learning from Examples, Explanation based, by Discovery, etc.
      -- also methods within types (e.g., Analyzing Differences)
      -- level change between what is given and what is learned?
      • e.g., generate general description from detailed examples (inductive)
      • e.g., could store information as it is presented (cases)
      -- sensitivity to noise (what's noise?)

    • Induction Heuristics
      -- supervised
      -- learning by induction from "well-chosen" given examples
      -- inductive reasoning (from specific to general)
      -- learn a concept (class description)
      -- Fig.16.1
      -- given +ve and -ve examples
      -- do examples matter?
        (e.g., how negative?)
      -- does order matter?

      • Responding to Near Misses Improves Models
        -- -ve examples are "near-miss" (not much wrong)
        -- model repr and example repr the same
        -- the right repr enables learning (e.g., explicit spatial relationships)
        -- need to isolate what's important in examples

      • Responding to Examples Improves Models
        -- Fig.16.2
        -- can require relations (e.g., must-support)
        -- Fig.16.3
        -- can forbid relations (e.g., must-not-touch)
        -- can enlarge set of types (e.g., new A-or-B class)
        -- Fig.16.4
        -- can generalize types (e.g., "brick" to "block")

      • Near-Miss Heuristics Specialize; Example Heuristics Generalize
        -- -ve examples restrict model (it was too general)
        -- +ve examples relax model (it was too specific)
        -- heuristics:
          require link, forbid-link, climb-tree, enlarge-set, drop-link, close-interval

      • Learning Procedures Should Avoid Guesses
        -- wait-and see principle (if in doubt, do nothing) (commitment?)
        -- no-altering principle (create a special case)

      • Learning Usually Must Be Done in Small Steps
        -- it's easier to learn something you almost know
        -- you need the right concepts to be able to learn (e.g., need brick to learn arch)

    • Identification
      • Must Links and Must-Not Links Dominate Matching
        -- describe & match (e.g., is unknown object an arch?)
        -- similarity dominated by must and must-not links

      • Models May Be Arranged in Lists or in Nets
        -- Fig.16.5
        -- similarity net
        -- similar models linked by their differences
        -- similar to "graph of models" used to select model for analysis


    19 Learning by Recording Cases

    • Recording and Retrieving Raw Experience
      -- sometimes good models are impossible to build
      -- need to resort to storing examples (cases)
      -- need to index cases
      -- may need to adapt them to new situation
      -- when is use of cases good? bad?

      • The Consistency Heuristic Enables Remembered Cases to Supply Properties
        -- assume consistency
        -- i.e., unknown property same as known
        -- Fig.19.1
        -- Fig.19.2
        -- e.g., feature space of blocks (given H and W, color?)

    • Finding Nearest Neighbors
      • A Fast Serial Procedure Finds the Nearest Neighbor in Logarithmic Time
        -- Fig.19.7
        -- use decision tree
        -- each nodes has test (e.g., Width > 3 ?)
        -- branches depending on answers
        -- leaf node has specific result
        -- in blocks example use a 2D k-d tree
        -- Fig.19.5
        -- Fig.19.6

    • CBR: Case-Based Reasoning
      -- have stored cases (e.g., recipes)
      -- indexed (perhaps via k-d tree)
      -- given ingredients find recipe (e.g., chicken, asparagus)
      -- find chicken & dumplings, and chicken & broccoli
      -- test for success, select chicken & broccoli
      -- not quite right, therefore "adaptation" needed
        could adaptation be done by CBR?
      -- substitution, gives needed recipe
      -- retain the final solution as a new case
      -- what if adaptation by substitution isnt enough?
      -- cases for everything?
      • case-based planning?
      • case-based diagnosis?
      • case-based design?
      • case-based chess?
      • case-based soccer?


    20 Learning by Managing Multiple Models

    • The Version-Space Method
      -- needs noise free data
      -- needs sequence of +ve and -ve examples
      -- -ve don't need to be near miss!
      -- builds a description (model) that describes data
      -- uses records from database
      -- does a kind of "data-mining"
      -- record has values for situation-characterizing attributes
      -- e.g., (place, meal, day, cost) plus "+ve" or "-ve"
      -- Sample data
      -- example: (Sam's, Dinner, Thursday, Expensive)
      -- +ve if person gets allergic reaction
      -- needs known set of attributes

      • Version Space Consists of Overly General and Overly Specific Models
        -- Fig.20.1
        -- version space: between most general and most specific description
        -- Negative examples specialize general descriptions (restrict)
          (It can't include the -ve example)
        -- Negative examples prune the specific descriptions
          (It can't both be and not be a suitable description)
        -- Positive examples generalize specific descriptions (relax)
          (It must be expanded to include the +ve example)
        -- Positive examples prune the general descriptions
          (Remove general models that don't match the +ve example)

      • Generalization and Specialization Leads to Version-Space Convergence
        -- Fig.20.2
        -- Each specialization must be a generalization of some specific model
          i.e., aim for the tree that's growing upwards
        -- No specialization can be a specialization of another general model
        -- Figs. 20.3 to 20-7

    • Version-Space Characteristics
      -- Result: even if you don't get a single model you get something useful
      -- Could this method be used for learning an Arch model?


    21 Learning by Building Identification Trees

    • From Data to Identification Trees
      -- widely used technique for learning
      -- trees can be used to generate rules
      -- Sample data
      -- uses records with values for fixed set of attributes
      -- e.g., (Name, Hair, Height, Weight, Lotion)
      -- plus classification for that record (e.g., sunburned)
      -- e.g. (Sarah, blonde, average, light, no, sunburned)
      -- samples may have noise (meaning?)
        (why is it OK for this approach?)

      • The World Is Supposed to Be Simple
        -- does Name affect sunburn?
        -- learning process prunes attributes that dont affect classification (useful)
        -- build identification tree (type of decision tree)
        -- Fig.21.1 and 21.2
        -- Occam's razor (for identification trees): small is good

      • Tests Should Minimize Disorder
        -- pick which attribute for root node and work down
        -- pick attributes based on "sorting power"
        -- i.e., minimizes disorder
        -- Fig.21.3 and 21.4
        -- i.e., divides data into most homogenous subsets

      • Information Theory Supplies a Disorder Formula
        -- Fig.21.5
        -- information "entropy" in information theory
          (how much randomness there is in a signal or random event)
        -- Disorder measured down each branch under a node
        -- Average disorder is weighted sum across all those branches
        -- Find average disorder for each unused attribute (at that tree level)
        -- Pick one with least average disorder (i.e., strongest sorting power)
        -- repeat down the tree

      • Try the Aussie example

    • From Trees to Rules
      -- trace each path to get a rule
      -- leaf node class is the consequent

      • Unnecessary Rule Antecedents Should Be Eliminated
        -- if blonde and uses-lotion then no-suntan
        -- try to remove antecedent, does it make a difference?
        -- if not, cut it.
        -- e.g. everyone who uses lotion avoids sunburn, so blondness is irrelevant

      • Unnecessary Rules Should Be Eliminated
        -- once rules simplified, try to reduce # of rules
        -- use default rule
          "IF no other rule applies THEN answer"
        -- use default rule that produces simplest rules

    25 Learning by Simulating Evolution

    • Survival of the Fittest
      • Chromosomes Determine Hereditary Traits
        -- genes determine traits
        -- chromosomes has list of genes
        -- gene scrambling is called crossover
        -- altered genes called mutation

      • The Fittest Survive
        -- evolution through natural selection
        -- traits determine fitness
        -- fitness determines survival
        -- fitness determines breeding
        -- breeding determines survival of traits
        -- traits passed to offspring

    • Genetic Algorithms (GAs)
      • Genetic Algorithms Involve Myriad Analogs
        -- GAs use analogies with individuals, populations, chromosomes, genes, mutation, crossover, fitness, natural selection.
        -- Individuals have fitness, represented by the Quality Score of their chromosome.
        -- Populations of individuals are represented by sets of chromosomes
        -- Fig.25.1
        -- searching in multi-dimensional space of solutions
          (Quality surface)
        -- Fig.25.3
        -- mutation makes random change to gene(s)
        -- Fig.25.4
        -- crossover splits and recombines two chromosomes

      • The Standard Method Equates Fitness with Relative Quality
        -- fitness is the probability that chromosome survives to the next generation
        -- "standard method" is individual fitness relative to sum of fitnesses

      • To Mimic Natural Selection
        • create initial population, determine fitness, then loop until done:
        • mutate genes to produce new chromsosomes.
        • produce crossovers to produce new chromsosomes.
        • add all new chromosomes to current population.
        • select best of current generation to make new generation.
        • do biased random selection by fitness to complete new generation.

      • Genetic Algorithms Generally Involve Many Choices
        -- size of population?
        -- mutation rate?
        -- how to select pairs for crossover?
        -- crossover point?
        -- chromosome duplication allowed?
        -- fitness calculation method?
        -- initial population?
        -- when to stop?

      • It Is Easy to Climb Bump Mountain Without Crossover
        -- the book's "mutation only" does hill climbing

      • Crossover Enables Genetic Algorithms to Search High-Dimensional Spaces Efficiently
        -- bias selection of pairs for crossover by fitness
        -- tends to combine good traits

      • Crossover Enables Genetic Algorithms to Traverse Obstructing Moats
        -- Fig.25.5
        -- crossover can jump across quality surface
        -- does still tend to get stuck around local maxima

      • The Rank Method Links Fitness to Quality Rank
        -- using rank breaks away from actual quality measure used
        -- sort individuals by quality
        -- rank sorted individuals (1st, 2nd, ...)
        -- assign 1st a rank fitness of p (e.g., p=2/3)
        -- remainder is 1 - p = 1/3
        -- assign 2nd a rank fitness of p.remainder = p.1/3 = 2/9
        -- remainder is 1/3 - 2/9
        -- assign 3rd a rank fitness of p.remainder
        -- etc.
        -- Fig.25.6
        -- Rank method gives nonzero fitness to all


    • Survival of the Most Diverse
      • The Rank-Space Method Links Fitness to Both Quality Rank and Diversity Rank
        -- keep newly selected chromosomes different from those already selected for population
        -- i.e., reward diversity as well as fitness
        -- use Rank-Space Method to select an individual for new population:
        • sort individuals by quality
        • sort individuals by diversity
          • diversity is sum of inverse squared distances to other already selected candidates (small value is better)
        • sort by sum of quality rank and diversity rank
        • use rank method on result
        • i.e., select best, assign to new population, and repeat.
        • break rank sum ties using diversity

      • The Rank-Space Method Does Well on Moat Mountain
        -- Standard method = 155 generations
        -- Quality rank = 75 generations
        -- Rank-Space = 15 generations

      • Local Maxima Are Easier to Handle when Diversity Is Maintained
        -- if some individuals are at local maxima, then diversity pushes new individuals away from those maxima.

      • What's in a population?
        -- John Koza, Genetic Programming: population of programs
        -- David B. Fogel, "Blondie24": population of checkers playing neural networks
          (Excellent paperback book)


    26 Recognizing Objects

    • Linear Image Combinations
      -- recognition by template construction and matching

      • Conventional Wisdom Has Focused on Multilevel Description
        -- conventional wisdom is as follows:
        • process brightness changes to form "primal sketch"
        • find suggested surfaces to get 2.5D sketch
        • these are "viewer centered"
        • find volumes suggested by 2.5D sketch to get "volume description"
        • that is "object centered"
        • match for recognition at the volume description level
        -- but can match at primal sketch level
        -- with the right templates

      • Images Contain Implicit Shape Information
        -- a few views of polyhedral object combine to give info about vertex positions
        -- e.g., plan and elevations

      • One Approach Is Matching Against Templates
        -- "identification model": three images
        -- each image has "feature points"
        -- given an "unknown" image to recognize/classify
        -- using stored images as "templates"
        -- Fig.26.1
        -- simple match is OK if views (rotations) are constrained
        -- but not in general

      • For One Special Case, Two Images Are Sufficient to Generate a Third
        -- Fig.26.2
        -- special case: orthographic projection (along z axis)
        -- rotate around y axis
        -- Fig.26.4
        -- need to match points on unknown and model image
        -- as y values dont change and z values arent relevant, use eqn for x only
        -- in general:   xIu = AxI1 + BxI2   (Eqn 1)
        -- i.e., matching points in unknown and two templates are related
        -- need to find A and B.

      • Identification Is a Matter of Finding Consistent Coefficients
        -- key idea:
        • Find two points in the unknown that correspond with two points in both template image 1 and template image 2 (I1 & I2).
        • Make two versions of Eqn 1 above, and solve for A and B (alpha and beta).
        • Use all other points in the two templates, and the equation using A and B, to predict all other points in the unknown.
        • If predicted points match actual points on unknown then it is the same type as the templates (e.g., Obelisk).
        -- Tables

      • The Template Approach Handles Arbitrary Rotation and Translation
        -- special case: rotation about y axis: 2 points and 2 templates
        -- for arbitrary rotation and translation
        -- use model with 3 image templates rotated and translated
        -- need to match four points

      • The Template Approach Handles Objects with Parts
        -- yup, it can do that too

      • Establishing Point Correspondence
        • Tracking Enables Model Points to Be Kept in Correspondence
          -- Fig.26.12
          -- small object movements allow small image changes
          -- tracking point correspondence is then easier

        • Only Sets of Points Need to Be Matched
          -- you dont need point-point correspondence
          -- only set to set correspondence

        • Heuristics Help You to Match Unknown Points to Model Points
          -- Fig.26.13
          -- use set of points at top or bottom



      27 Describing Images

      • Computing Edge Distance
        -- find edges in images

        • Averaged and Differenced Images Highlight Edges
          -- Fig.27.1
          -- sharp changes in brightness
          -- noise in image = spurious edges
          -- remove noise first, then find changes
          -- Fig.27.2
          -- image array to average-brightness array (smoothing)
          -- then find 1st and 2nd derivatives
          -- average-brightness array to average-first-difference array
          -- average-first-difference array to average-second-difference array
          -- high rate of change indicates edge
          -- i.e., a zero crossing
          -- Fig.27.3
          -- combine smoothing and two differentations
          -- to single operator (a point-spread function P)
          -- convolve I with P to give O
          -- for 2D
          • Fig.24.1 (p.493)
          • smoothing: convolve with bell-shaped Gaussian function
          • width of Gaussian affects detail found (narrow, more small edges)
          • combined smoothing+differencing = Mexican Hat shape (sombrero)
          • sombreros can be wide and narrow too

        • Multiple-Scale Stereo Enables Distance Determination
          -- Stereo Vision: two images with detected edges
          -- Fig.27.4
          -- need to find distance from cameras to objects (i.e. to edges)
          -- distance to point is inversely proportional to sum of shift in point's position in the 2 images (disparity)
          -- problem: to measure disparity need to find "corresponding features" in the two images
          -- Fig.27.5
          -- to find correspondence:
          • for each horizontal slice through image
          • find nearest neighbors for each zero-crossing fragment in L image
          • find nearest neighbors for each zero-crossing fragment in R image
          • find pairs that are the closest neighbors of each other
          • match found if distance less than threshold tolerance
          -- Fig.27.7
          -- wide sombrero produces fewer lines, hence less ambiguous matching
          -- narrow sombrero gives more precision, but more ambiguity
          -- more precision in disparity gives better distance estimates
          -- use width w/2 sombrero for accuracy
          -- Fig.27.8
          -- confirm matching (disambiguate) using w sombrero results

      • Computing Surface Direction
        -- shape --> surface --> shading
        -- how to get shape from shading?
        -- first study shading from surface

        • Reflectance Maps Embody Illumination Constraints
          -- Fig.27.9
          -- light to surface: incident angle
          -- surface to eye: emergent angle
          -- Lambertian surface: brightness depends only on direction of light source
          -- E = rho * cos i (rho is the surface "albedo")
          -- "matte" surface (nonspecular)
          -- Fig.27.10
          -- illuminate Lambertian sphere: isobrightness lines
          -- project lines from sphere to plane
          -- Fig.27.11
          -- make reflectance maps (of cos i values)
          -- new map for each incident+emergent angle
          -- FG plane (map) is // to image plane
          -- (f,g) point on map represents a surface orientation

        • Making Synthetic Images Requires a Reflectance Map
          -- to generate image
          -- have f and g for every point on image
          -- derived from elevation data
          -- have reflectance map for light position
          -- look up every (f,g) in map to get cos i
          -- use appropriate rho
          -- i.e., (f,g) to brightness

        • Surface Shading Determines Surface Direction
          -- brightness to (f,g)?
          -- brightness at point gives curve on FG map
          -- but surfaces vary smoothly
          -- two constraints
          -- need some known values to start with
          -- set unknown f and g values to 0
          -- (f,g) known at occluding boundary of smooth objects
          -- use relaxation procedure to gradually force constraints to apply across surface


      28 Expressing Language Constraints

      • Natural Language is complex
        -- "I saw the girl in the park with the telescope"
        -- complex structure (nesting is possible)
        -- we can spot incorrect examples (so what?)
        -- ambiguity is common
        • lexical ("bank")
        • syntactic ("the smart girls and boys")
        • pragmatic ("I saw the statue of liberty flying over New York")

      • The Search for an Economical Theory
        -- goal: understanding linguistic constraints
        -- seek simple, concise, comprehensive model

        • Could start with Words
          -- Morphemes: pieces of words with meaning
          -- Mary('s), dog(s), (un)do, (dis)inherit

        • You Cannot Say That
          -- unacceptable sentences act as investigative tool
          -- e.g., * The books is red.
          -- constraint is subject/verb agreement

        • Phrases Crystallize on Words
          -- words have categories
            Determiner (the), Adjective (green), Adverb (slowly), Noun (dog), Verb (sing), Preposition (to, in), Pronoun (she), Quantifier (all), Auxiliary Verb (will, have), Complementizer (that, which)
          -- words form groups called phrases
          -- NP: Noun Phrase: "the green book"
          -- PP: Prepositional Phrase: "to the library"
          -- VP: Verp Phrase: "kissed the frog"
          -- can have phrases that use/contain phrases

        • Structure is Syntax
          -- can use formal grammar (e.g., CFG)
            S --> NP VP
            NP --> Art Adj Noun
            VP --> Verb NP
          -- could parse, could generate
          -- CFG OK for NL?
          -- need some context sensitivity
            Mary sat on her chair
            John sat on his chair
          -- is the visible, surface structure in the sentence all there is?
          • "John is easy to please"
          • "John is eager to please"

        • Many Phrase Types Have the Same Structure
          -- NP: specifier, noun, PP
            (the)(library)(in the city)
          -- PP: specifier, preposition, NP
            (precisely)(at)(noon)
          -- VP: specifier, verb, NP
            (all)(return)(their books)

        • The X-Bar Hypothesis Says that All Phrases Have the Same Structure
          -- Specifier, head, one or more Complements
            ( )(return)(her book)(to the library)(in the morning)
          -- IP: Inflection Phrase: adds tense to verb in VP
            has head of "-ed" (returned), or "will" (will return)
          -- CP: Complementer Phrase: allows embedded phrases
            He said that (Sarah will return her book)

      • The Search for a Universal Theory
        -- claim: the X-bar representation makes important things explicit and exposes natural constraints

        • A Theory of Language Ought to Be a Theory of All Languages
          -- arrangement of specifiers and complements varies by language

        • A Theory of Language Ought to Account for Rapid Language Acquisition
          -- language is learned very quickly
          -- hypothesis is that universal language constraints are "built-in"
          -- people build sentences from meaning using language knowledge
          -- fine tuning of grammar is learned
            "Look at the sheeps"

        • A Noun Phrase's Case Is Determined by Its Governor
          -- Case assignment: how word fits into sentence
          -- Nominative (I, they), Accusative (me, them), Genitive (my, their)
          -- move up X-bar tree to find governing head to determine pronoun's case

        • Subjacency Limits Wh- Movement
          -- X-bar theory shows constraints on forming questions

      • Competence versus Performance
        -- Competence: knowledge of a language and its rules/constraints.
          (idealized capacity to recognize a theoretically infinite number of sentences)
        -- Performance: external view of language competence (actual utterances), limited by memory, social context, etc.
          "When the, uh, ..., I can't think, ..., of it, eeeeer, the operator is placed, for either vertical or horizontal detection. You put it over the image. Hmmmm, here and here. That results."

    • Analysis by Reversing Generation Can Be Silly
      -- language generation involves many context-specific and hearer specific adjustments
      -- grammars normally express competence

    • Construction of a Language Understanding Program Remains a Tough Row to Hoe
      -- language is very flexible and hence hard to study
      -- language is complex and grammars are likely to be too
      -- we can understand ungrammatical sentence
      -- idiomatic/metaphorical usage makes hoeing harder

    • NLU is very hard
      • The word "spinglesquidge" never appears in a sentence.
      • I ate the cake with the frosting.
        I ate the cake with the girl.
        I ate the cake with the spoon.
      • The sat cat mat on the.
        Curious green ideas sleep furiously.
        The box is in the pen.
        I cut the cake with the tractor.
        Fruit flies like a banana.
      • Bob gave Mary a book. He was pleased. She was pleased.
        Bob gave Fred a book. He was pleased.
      • The class hurled abuse at the broken teacher.
      • Class end soon. You go then.

    • Engineers Must Take Shortcuts
      -- aim at specific, limited tasks
      -- task-specific grammar and vocabulary


    29 Responding to Questions and Commands

    -- engineering approach
    -- translate questions/command to database commands
    -- use task and domain-specific language
    -- constraints on use make it easier
    -- other possible approaches: template matching, CFGs,...
    -- need grammar, dictionary, knowledge of target repr
    • Syntactic Transition Nets
      -- transition net
      -- recursive transition net (RTN)
      -- augmented RTN (ATN)
      -- top down, goal driven

      • Syntactic Transition Nets Are Like Roadmaps
        -- sentence net & other subnets
        -- nets have start and terminal nodes
        -- can traverse link by matching word, word type, of phrase
        -- phrase needs push to subnet
        -- accept input: at end of S net and all words used

      • A Powerful Computer Counted the Long Screwdrivers on the Big Table
        -- some links will fail
        -- if all links fail, subnet fails
        -- nested-box diagram

      • Full ATN -- add tests to arcs
        -- add actions to arcs
        -- use memory (registers)

    • Semantic Transition Trees
      -- ATN usually organized by syntax
      -- STT organized by meaning
      -- i.e., non-terminals are semantic not syntactic
      -- e.g., Action Object "with" Tool
      -- "Cut the paper with the scissors"
      -- could be made to handle non-syntactic input

      • A Relational Database Makes a Good Target
        -- (Class Color Size Weight Location)
        -- answers retrieved from DB

      • Pattern Instantiation Is the Key to Relational-Database Retrieval in English
        -- query schema instantiated using words from sentence
        -- SELECT < object with < values
        -- note use of "> object" and "< object"

      • Moving from Syntactic Nets to Semantic Trees Simplifies Grammar Construction
        -- 1-1 correspondence between paths and terminal nodes
        -- terminals can be query building points

      • Recursion Replaces Loops
        -- recursion needed in ATNs too
        -- could have Semantic ATN

      • Q&A Translates Questions into Database-Retrieval Commands
        -- could also be used to access "canned" help text