This project has 2 parts:
The network consists of Labeled States connected by Labeled Directed
Arcs.
The labels on the states can be used to note what has been recognized
up to that point in the grammar.
Arcs can be traversed if the next thing in
the sentence is what is indicated by the label of the arc.
A Label can be a
word, a word category (e.g., Noun), or the name of a sub-network (e.g.,
NounPhrase).
Each subnet corresponds to a Non-Terminal in the grammar.
An attempt to traverse a given
arc can succeed or fail.
If a sub-network name appears on an arc then that
network is used to determine whether the arc can be traversed.
Recognition takes place in a net when
a distinguished state
(*END*)
is reached.
Complete recognition of an input
requires recognition by the top-most net, with no more words left in the
input.
One way to accomplish this is to place a special symbol,
such as
END-OF-INPUT
at the end of
the input sentence; the sentence is successfully parsed if and only if this
symbol is recognized while reaching the *END*
state.
For a sample ATN, see page 600 of ``Artificial Intelligence.''
Transition across an arc can depend on the successful traversal of a subnet; you will need a stack to keep track of the parsing situation. As more than one arc can emerge from a state it is necessary to pick one to follow when you reach a state. This is done by considering the arcs as ordered, and selecting them in order. As an attempt to follow an arc may fail, you will have to keep track, for every state, of the arcs that have still to be tried. At failure the ATN will backup to the last choice-point and select the next arc to try.
To be more specific, each arc indicates its source and destination states and a label from the following list. The exact format is up to you.
(CAT [category]) ([post-actions]) (WRD [word]) ([post-actions]) (MEM [list]) ([post-actions]) (TEST [test]) ([post-actions]) (PUSH [state]) ([pre-actions]) ([post-actions])The first element of the label is the arc type. The second element depends on the type. The actions can be any actions that you like. In a full-blown ATN they generally manipulate the contents of a set of registers. If you like you can use some global variables with pre-defined names and/or you can use the slots of one or more frames.
The pre-actions, if any, occur just prior to pushing to a new state. The post-actions occur just prior to reaching the destination state. Here is the list of actions. Again, the exact format is up to you.
(SET-REGISTER [register] [value]) (GET-REGISTER [register]) (LIST [contents])The input sentence is scanned by means of an input pointer, which always points to the next input word to be examined. Arcs that succeed in recognizing an input word advance the input pointer. The
CAT
arc is followed if the current input word belongs to the right category.
A WRD
arc specifies the exact word that must be matched, while the MEM
arc
specifies that the input must be a member of the specified list of words
such as (a, an, the).
All three advance the input pointer.
Note that it is
traditional to refer to the current item, that is, the item just
successfully parsed, in the input as ``*
''.
The TEST
arc is optional.
If you choose to implement it, the arc
can be used to check that a value/marker has been placed in one particular
register (i.e., global variable).
It is a special kind of conditional jump
arc.
You can either put a value explicitly in the arc or use a function
that will obtain a value.
The PUSH
arc starts processing at the network start state named.
The
pre-actions are done prior to transfer to the new net; the post-actions
after successful completion of that net.
to the PUSH
arc that called it.
Because the POP
action does not specify a destination, its
format is a bit different from the other arcs;
POP
does not have any destination state.
It is an error to execute a POP
with no prior PUSH
.
The form in a POP
arc, which can be anything you want it
to be, is returned to the calling PUSH
arc as a result which can then be acted upon
by the post-actions.
If you like you can implement this without this
return mechanism, and return results using global variables.
The distinguished state
that marks the end of a net (*END*
) is
a state with a single POP
leaving it.
Note that as backtracking can occur, the state of the target data-structures must be included in each recorded state situation. These situations are recorded on a stack. If you do not include the data-structures on the stack, then you will have to undo actions! That is, you should build any frames apart from the main frame hierarchy, and only attach them to it after a completely successful parse.
You might want to have a dictionary accessing function that can check if a
certain word has a certain feature.
For example,
(GetFeature Number *)
might
return Plural
.
The predicate (CheckFeature Number * Plural)
might return
T
.
Other functions or predicates can be invented as necessary.
The
exact form of the dictionary is up to you.
It should probably include the
Category (e.g., Noun), the Number (e.g., Plural), and the Root form (e.g.,
Eat, as opposed to Eating).
It could also contain thematic role information for
verbs (e.g., Agent + Object + Instrument).
The ATN is to be specified in some notation and read in, thus setting up a network that the ATN interpreter can act on. For extra credit, use frames to represent the ATN and the dictionary. Clearly demonstrate the action of your interpreter with several sentences. The usual documentation and demonstration standards apply. That is, you should give an overview of your algorithm and enough description so that we can understand what you did. For the dictionary and grammar, you should clearly explain the syntax.
Submit your programs and data files. You should have separate files for the ATN Interpreter, dictionary, grammar, and samples of operation.
A sample ATN from class and its representation in list form (somewhat obscurely!) follow. Your input grammar does not need to use this syntax, it is merely included to be illustrative.
(DEFINE-ARC S1 S2 (PUSH S4) (SET-REGISTER SUBJECT *) (SET-REGISTER TYPE DECLARATION)) (DEFINE-ARC S2 S3 (PUSH S10)) (DEFINE-ARC S3 S3A (WRD *END*)) (DEFINE-NET-POP S3A (LIST SUBJECT (GET-REGISTER SUBJECT) VERB (GET-REGISTER VERB) OBJECT (GET-REGISTER OBJECT))) (DEFINE-ARC S4 S5 (CAT DETERMINER)) (DEFINE-ARC S5 S5 (CAT ADJ)) (DEFINE-ARC S5 S6 (CAT NOUN) (SET-REGISTER NOUN *)) . . .
The system should take natural language input and build a frame-based representation for the input. Any action taken as a result of the input should involve the generation or updating of frames also. The action may also cause rules to fire. Having understood and acted upon the input, the system must then generate a suitable response.
Here are some possibilities to consider:
Your conversation with the system will consist of declarations, commands, and queries. Some things are easy to enter. For example:
== (hello) (I UNDERSTAND) == (a dog is an animal) (I UNDERSTAND) == (what is a dog) (A DOG IS A-KIND-OF ANIMAL) == (rover is a dog) (I UNDERSTAND) == (is rover an animal) (ROVER IS AN ANIMAL) == (is rover alive) (I DO NOT KNOW) == (if a thing is an animal then it is alive) (I HAVE LEARNED A NEW RULE) == (is rover alive) (ROVER IS ALIVE)This example is meant to be illustrative. You do not have to be overly sophisticated; simple conversations work best. New words are extremely difficult to parse and incorporate. So difficult, in fact, that they are almost out of our league (but it has been done).
Keep in mind that the main purpose of this project is to integrate the results of your previous efforts. We are more interested in having you combine rule-based inference, frame-based knowledge representation, and natural language understanding than in getting clever conversation.
The usual documentation and demonstration standards apply. In particular, describe your overall approach with an emphasis on how you have made the previous projects interact. You do not have to re-document unchanged portions of earlier projects, but you must adequately document any changes you have made to them. Submit all your programs and data files, including rules, initial knowledge base, grammar, dictionary, and of course, sample inputs and outputs.
/cs/bin/turnin submit cs4341 proj3 *.*
This project will be graded based on:
Exception: For this and subsequent projects, you are allowed to use
the pattern-matching code in
/cs/cs4341/ai3/match.lsp
, provided that you so state.