CPSA 2.2 User Guide

The Cryptographic Protocol Shapes Analyzer (CPSA) attempts to enumerate all essentially different executions possible for a cryptographic protocol. We call them the shapes of the protocol. Naturally occurring protocols have only finitely many, indeed very few shapes. Authentication and secrecy properties are easy to determine from them, as are attacks and anomalies.

For each input problem, the CPSA program is given some initial behavior, and it discovers what shapes are compatible with it. Normally, the initial behavior is from the point of view of one participant. The analysis reveals what the other participants must have done, given the participant's view. The search is based on a high-level algorithm that was claimed to be complete, i.e. every shape can in fact be found in a finite number of steps. Further theoretical work showed classes of executions that are not found by the algorithm, however it also showed that every omitted execution requires an unnatural interpretation of a protocol's roles. Hence the algorithm is complete relative to natural role semantics.

The analyzer is designed to work well with other tools. S-expressions are used for both input and output. The analyzer reads all the problems in its input, writes out the solution to each problem, and then exits. This release contains seven tools. The cpsagraph program provides a visualization of answers using Scalable Vector Graphics (SVG). The cpsashapes program removes intermediate results from analyzer runs making the shapes easy to identify. The cpsadiff program compares CPSA output files S-expression by S-expression, and prints the first skeleton that differs. The cpsalogic program logical formula that can be used to ensure security goals are achieved. The cpsaannotations program uses protocol annotations to annotate shapes and generate protocol soundness obligations. The cpsaparameters program detects some specification errors by performing a data flow analysis on protocol roles. The cpsapp program pretty prints its input using a CPSA specific algorithm.

The input syntax is essentially the same as the output syntax. A Lisp aware editor will pretty print input, and the output is pretty printed. We use Emacs' Scheme mode to prepare input. This document describes the syntax, but provides little assistance for its interpretation. See the CPSA Tutorial document for this information.

The typical pattern of usage is to enter the set of problems to be analyzed into a file, in this example, prob.scm, analyze the problems, and if something interesting is produced, visualize the answers.

$ cpsa +RTS -M512m -RTS -o prob.txt prob.scm
$ cpsagraph -o prob.xhtml prob.txt
$ firefox -remote "openFile(`pwd`/prob.xhtml)"

Often a summary of the analysis is more enlightening.

$ cpsashapes -o prob_shapes.txt prob.txt
$ cpsagraph -o prob_shapes.xhtml prob_shapes.txt
$ firefox -remote "openFile(`pwd`/prob_shapes.xhtml)"

The distribution comes with the file cpsa.mk for inclusion into your makefile. A sample makefile follows. If you cut-and-paste from a browser window, be sure to convert the leading spaces in the last line to a tab character.

CPSAFLAGS = +RTS -M512m -RTS

SRCS := $(wildcard *.scm) $(wildcard *.lsp)

include cpsa.mk

all:    $(SRCS:%.scm=%_shapes.xhtml) $(SRCS:%.scm=%.xhtml) \
        $(SRCS:%.lsp=%_shapes.xhtml) $(SRCS:%.lsp=%.xhtml)

clean:
        -rm *.txt *.xhtml

For platforms without GNU make, the Haskell program Make.hs can be loaded into a Haskell interpreter and perform a similar function. Users are expected to copy the makefile or the Haskell program into the directory containing a set of analysis problems, and then modify the file so it specifies the command line flags appropriate for the problems in the directory.

Syntax

The syntax is extensible. Association lists in which the key is always a symbol are allowed at various points in the grammar. Key-value pairs with unrecognized keys are ignored, and are available for use by other tools. On output, unrecognized key-value pairs are preserved when printing protocols, but elided when printing skeletons, with the exception of the comment key.

The input is an option herald form followed by a sequence of protocol definitions and problem statements. A herald form allows options normally specified on the command line to be specified within an input file. A problem statement describes the initial behavior as a skeleton. The grammar for a protocol follows.

PROTOCOL   ::= (defprotocol ID ALG ROLE+ PROT-ALIST)
ID         ::= SYMBOL
ALG        ::= SYMBOL
ROLE       ::= (defrole ID VARS TRACE ROLE-ALIST)
VARS       ::= (vars DECL*)
DECL       ::= (ID+ SORT)
TRACE      ::= (trace EVENT+)
EVENT      ::= (send TERM) | (recv TERM)
ROLE-ALIST ::= (non-orig HT-TERM*) ROLE-ALIST
            |  (pen-non-orig HT-TERM*) ROLE-ALIST
            |  (uniq-orig TERM*) ROLE-ALIST | ...
HT-TERM    ::= TERM | (INT TERM)
PROT-ALIST ::= ...

The protocol ID is a symbol that names the protocol, and the role ID is a symbol that names the role. The ALG symbol identifies the algebra used by the protocol. For the Basic Cryptographic Algebra, the symbol is basic. For the Diffie-Hellman Algebra, the symbol is diffie-hellman. The var form contains symbols that declare the sort of the variables used in this role. The set of sort symbols is algebra specific. The protocol association list has no predefined keys, while the role association list has three. The value associated with both of these keys must be atoms in the algebra. Each non-orig term must not be carried by any event in the role's trace, but each of its variables must occur in some term. A role non-origination assumption of the form (3 a) asserts that atom a will not be mapped into an instance unless its height is at least three. For each pen-non-orig term, each of its variables must occur in some term, but unlike a non-origination assumption, the term may be carried. Each uniq-orig term must originate in the role's trace. Every non-atomic variable must be acquired by the role's trace.

The structure of sorts and terms in the Basic Cryptographic Algebra follows.

SORT ::= text | data | name | skey | akey | mesg
TERM ::= ID | STRING | (cat TERM+) | (enc TERM+ TERM) | (hash TERM+)
      |  (pubk ID) | (privk ID) | (invk ID) | (ltk ID ID)
      |  (pubk STRING ID) | (privk STRING ID)

The form (cat a b c d e) is sugar for (cat a (cat b (cat c (cat d e)))).

A term in the Basic Crypto Algebra is an atom if it is variable of a sort other than mesg, or it if formed from one of the following operations: pubk, privk, invk, and ltk.

SKELETON   ::= (defskeleton ID VARS STRAND+ SKEL-ALIST)
STRAND     ::= (defstrand ID INT MAPLET*) | (deflistener TERM)
MAPLET     ::= (TERM TERM)
SKEL-ALIST ::= (non-orig TERM*) SKEL-ALIST
            |  (pen-non-orig TERM*) SKEL-ALIST
            |  (uniq-orig TERM*) SKEL-ALIST
            |  (precedes NODE-PAIR*) SKEL-ALIST | ...
NODE-PAIR  ::= (NODE NODE)
NODE       ::= (INT INT)

The ID in the skeleton form names a protocol. It refers to the last protocol definition of that name which precedes the skeleton form. The ID in the strand form names a role. The integer in the strand form gives the height of the strand. The sequence of pairs of terms in the strand form specify an environment used to construct the messages in a strand from its role's trace. The first term is interpreted using the role's variables and the second term uses the skeleton's variables. The environment used to produce the strand's trace is derived by matching the second term using the first term as a pattern.

The first integer in a node identifies a strand, and the second one specifies the position of the node in the strand. Zero-based indexing is used. Zero identifies the first strand, and the first node in a strand has position zero.

A variable may occur in more then one role within a protocol. The reader performs a renaming so as to ensure these occurrences do not overlap. Furthermore, the maplets used to specify a strand need not specify how to map every role variable. The reader inserts missing mappings, and renames every skeleton variable that also occurs in a role of its protocol. The sort of every skeleton variable that occurs in the non-orig, pen-non-orig, or uniq-orig list or in a maplet must be declared.

On output, key-value pairs are added to all skeleton's association list. Every skeleton in the output is labeled with a unique identifier with (label INT) A skeleton has (parent INT) if it is a member of the cohort of the identified parent. A skeleton has (seen INT+) when members of its cohort are isomorphic to previously seen skeletons. A skeleton lists its unrealized nodes with (unrealized NODE*). The traces associated with each strand is given by the (traces ...) form. Finally, the operation used to derive this skeleton from its parent is recorded with (operation TEST KIND TERM NODE TERM*), where TEST is the authentication test encryption-test or nonce, KIND is (added-strand ID INT), (contracted MAPLET*), or (added-listener TERM), TERM is the critical term, NODE in the test node, and the remaining terms specify the escape set. When the operation kind is augmenting, the instance's role name and height are provided. For kind listening, a term is provided. For kind contracting, the substitution is provided. When a substitution refers to a variable not in the skeleton, its name is unpredictable. For generalization, the operation is recorded as (operation generalization METHOD), where METHOD is one of deleted NODE, weakened NODE-PAIR, separated TERM and forgot TERM. Generalization is used to find shapes from realized skeletons. For shape collapsing, the operation is recorded as (operation collapsed INT INT), where the two INTs identify the strands merged. Shape collapsing is used to find related shapes.

The current Diffie-Hellman support is experimental and known to have flaws. The additional sorts and terms in the Diffie-Hellman Algebra follow.

SORT ::= ... | base | expn
TERM ::= ... | (gen) | (exp TERM TERM)

Given exponent x, (gen) is the group generator constant, and given g and x, (exp g x) is g raised to the x power. Terms of the form (gen) or (exp g x) are of sort base, and x is of sort expn. Terms of sort expn are atoms, but terms of sort base are not.

Reasoning about the commutativity of exponents is supported via following equation.

(exp (exp (gen) x) y) = (exp (exp (gen) y) x)

To avoid known Diffie-Hellman flaws, avoid variables of sort base.

Macros and Includes

After reading the input, cpsa expands macros before in analyzing the results. A macro definition is a top-level form.

(defmacro (NAME ARG*) BODY)

The cpsa program expands all calls to macros in non-macro defining S-expressions using the macros that have been defined previously. A macro definition can be used to expand a call if the first element of a list matches the name of the macro, and the length of the remaining elements in the list matches the length of the macro's argument list. When two macros definitions are applicable, the last definition takes precedence. The cpsa program omits macro definitions from its output.

A source file can be included within another with the top-level include form,

(include FILE)

where FILE is a string. The cpsa program includes files while it does macro expansion.

Usage of CPSA

$ cpsa -h
Usage: cpsa [OPTIONS] [FILE]
  -o FILE    --output=FILE      output FILE
  -l INT     --limit=INT        step count limit (default 2000)
  -b INT     --bound=INT        strand count bound (default 8)
  -m INT     --margin=INT       set output margin (default 72)
  -e         --expand           expand macros only; don't analyze
  -n         --noisochk         disable isomorphism checks
  -c         --check-nonces     check nonces first
  -t         --try-old-strands  try old strands first
  -r         --reverse-nodes    try younger nodes first
  -a STRING  --algebra=STRING   algebra (default basic)
  -s         --show-algebras    show algebras
  -h         --help             show help message
  -v         --version          show version number

This program will abort if too many steps are taken. A skeleton is printed for each step taken by the program. The step count limit option is used to override the default step count limit. It will also abort when it detects a skeleton with too many strands. The strand count bound option is used to override the default strand count bound. Another way to limit resources used by the program is to limit the amount of memory it may use. The command-line option +RTS -M512m -RTS limits memory usage to 512m.

The herald form can be used to specify options within a file. The grammar for a herald form follows, where ALIST is an association list with symbols as keys.

HERALD ::= (herald TITLE ALIST)
TITLE  ::= SYMBOL | STRING

The title is used by cpsagraph when generating XHTML. In the following example, the herald form specifies a strand bound of 12 in a way that is equivalent to the command line option --bound=12.

(herald "Needham-Schroeder Public-Key Protocol" (bound 12))

The herald form is only interpreted by the cpsa program, and is treated as a comment by all other programs. In particular, specifying a margin option effects cpsa only.

For long running problems, SMP parallelism is available. For example, on a quad-core machine, we would probably use +RTS -N4 -RTS.

In addition to the options provided on the command line or a herald form, one can influence the order in which test nodes are sought. When a role includes (reverse-search) in its association list as a comment, the nodes in its instances will be searched in reverse order.

When run with isomorphism checks disabled (--noisochk), CPSA searches for realized skeletons, not shapes. It attempts to speed up analysis by not identifying duplicate skeletons or generalizing realized skeletons, however in many cases, run times increase. It is used when normal analysis takes too much time on the chance that avoiding the checks makes more progress.

In no isomorphism checking mode output, every realized skeleton is labeled a shape even when it is not. This allows the extraction of every realized skeletons from the output using the cpsashapes program.

An error message that begins with "No test for unrealized node" identifies a severe error that should be reported as a bug.

Visualization

The cpsagraph program produces a graphical rendering of the output or input of an analyzer using SVG. It is viewable only with a standards-compliant web browser such as FireFox.

When applied to the output of CPSA, the program groups together all the skeletons related to each problem statement in the input. The program assembles the related skeletons into a directed acyclic graph using the parent identifier and renders it as a tree using SVG. Each vertex in the tree is the label added to the skeleton by the CPSA program. Click the label in the tree to view the skeleton. Hover over a node in a skeleton drawing to see the term associated with it. Hover over a role to see the mapping from role variables to skeleton variables.

A node ordering edge in a skeleton drawing indicates that the message transmission at the source of the edge happened before message reception at the destination edge. The edge is solid if the transmitting term is syntactically identical to the receiving term, otherwise the edge is dashed. Thus in an algebra with a commutative operation *, the graphing program sometimes draws a dashed line between (send (* a b)) and (recv (* b a)).

In the tree drawing, the label of a shape is blue. For dead skeletons, the label is red, unless it has been seen before, in which case it is orange. Otherwise the label is green for skeletons seen before. Seen skeleton labels are rendered in an italic font.

By the default, cpsagraph generates a view of CPSA S-expressions as a compound document that contains SVG within XHTML. This view integrates graphics with the input text.

In compact mode, cpsagraph generates an SVG document. The tree is displayed immediately to the left of skeleton drawings. When there is more than one tree, the left-hand-side of the drawing contains a vertical listing of the trees. Compact mode output should never be used as a replacement for studying the text version of the output. The text version contains strictly more information, and should be displayed next to its graphical rendition.

In LaTeX mode, cpsagraph generates LaTeX source. XY-pic is used for drawings of skeletons. The margin specified in cpsa.mk produces good results.

Usage

$ cpsagraph -h
Usage: cpsagraph [OPTIONS] [FILE]
  -o FILE  --output=FILE  output FILE
  -x       --expanded     use expanded format (default)
  -z       --zoom         enable diagram scaling
  -t       --treeless     use treeless expanded format
  -c       --compact      use compact format
  -l       --latex        use LaTeX format
  -m INT   --margin=INT   set output margin (default 72)
  -i       --infix        output uses infix notation
  -h       --help         show help message
  -v       --version      show version number

Output Comparisons

The cpsadiff program compares CPSA output files S-expression by S-expression, and prints the first skeleton that differs.

Usage

$ cpsadiff -h
Usage: cpsadiff [OPTIONS] OLD-FILE NEW-FILE
  -o FILE  --output=FILE  output FILE
  -m INT   --margin=INT   set output margin (default 72)
  -h       --help         show help message
  -v       --version      show version number

Shape Extraction

The cpsashapes program extracts the original problems and the shapes from the output of a CPSA run. The shapes are linked to their problem so the output can be graphed.

Usage

$ cpsashapes -h
Usage: cpsashapes [OPTIONS] [FILE]
  -o FILE  --output=FILE  output FILE
  -m INT   --margin=INT   set output margin (default 72)
  -h       --help         show help message
  -v       --version      show version number

Formula Extraction

The cpsalogic program extracts a formula in the language of order-sorted first-order logic for each problem and its shapes from a CPSA run. The formula is called a shape analysis sentence. The formula is satisfied in all realized skeletons when CPSA finds all the shapes for the problem.

Usage

$ cpsalogic -h
Usage: cpsalogic [OPTIONS] [FILE]
  -o FILE    --output=FILE     output FILE
  -m INT     --margin=INT      set output margin (default 72)
  -a STRING  --algebra=STRING  algebra (default basic)
  -s         --show-algebras   show algebras
  -h         --help            show help message
  -v         --version         show version number

Annotations

The cpsaannotations program uses protocol annotations to annotate shapes and generate protocol soundness obligations.

Usage

$ cpsaannotations -h
Usage: cpsashapes [OPTIONS] [FILE]
  -o FILE    --output=FILE     output FILE
  -m INT     --margin=INT      set output margin (default 72)
  -a STRING  --algebra=STRING  algebra (default basic)
  -s         --show-algebras   show algebras
  -h         --help            show help message
  -v         --version         show version number

Syntax

To be analyzed, each role in a protocol must include an annotations form. The TERM is a role atom that when instantiated, is the name of a principal in the shape. What follows is sequences of pairs. The integer gives the position of the event in the trace that is annotated by the formula. Zero-based indexing is used.

The language of formulas is order-sorted first-order logic extended with a modal says operator. Formula terms may include function symbols that are not part of a protocol's message signature.

ANNOTATION ::= (annotations TERM (INT FORMULA)*)
FORMULA    ::= (ID FTERM*) | (not FORMULA)
            |  (and FORMULA*) | (or FORMULA*)
            |  (implies FORMULA* FORMULA)
            |  (iff FORMULA FORMULA)
            |  (says TERM FORMULA)
            |  (forall (DECL*) FORMULA)
	    |  (exists (DECL*) FORMULA)
FTERM      ::= TERM | (ID FTERM*)

Use (and) for truth and (or) for falsehood.

Output

On output, each shape contains an annotations form and a obligations form. The annotations form presents every non-trivial formula derived from the protocol. The obligations form presents every non-trivial soundness obligation.

Parameter Analysis

The parameters of a role are the atoms that are not acquired by the role's trace, but must be available before a complete execution of the trace is possible. This program computes sets of sets of parameters consistent with the role. If the expected parameter set is not a member, a specification error is indicated.

Usage

$ cpsaparameters -h
Usage: cpsaparameters [OPTIONS] [FILE]
  -o FILE    --output=FILE     output FILE
  -m INT     --margin=INT      set output margin (default 72)
  -a STRING  --algebra=STRING  algebra (default basic)
  -s         --show-algebras   show algebras
  -h         --help            show help message
  -v         --version         show version number

Output

On output, each role contains a parameters form with a list of sets of atoms.

(defrole resp (vars (b a name) (n2 n1 text))
  (trace
    (recv (enc n1 a (pubk b)))
    (send (enc n1 n2 (pubk a)))
    (recv (enc n2 (pubk b))))
  (parameters
    (n2 n1 a (pubk b) (pubk a))
    (n2 (privk b) (pubk a))))

In this example, the second parameter set is what is expected for a responder role in this version of the Needham-Schroeder protocol.

Macro expansion is not performed by this program. Use the -e option with cpsa to preprocess input that contains macro definitions.

Pretty Printing

The cpsapp program program pretty prints its input using the CPSA specific algorithm.

Usage

$ cpsapp -h
Usage: cpsapp [OPTIONS] [FILE]
  -o FILE  --output=FILE  output FILE
  -m INT   --margin=INT   set output margin (default 72)
  -i       --infix        output uses infix notation
  -h       --help         show help message
  -v       --version      show version number

S-expressions

The S-expressions used are restricted so that most dialects of Lisp can read them, and characters within symbols and strings never need quoting. Every list is proper. An atom is either a symbol, an integer, or a string. The characters that make up a symbol are the letters, the digits, and the special characters in "-*/<=>!?:$%_&~^". A symbol may not begin with a digit or a sign followed by a digit. The characters that make up a string are the printing characters omitting double quote and backslash. Double quotes delimit a string. A comment begins with a semicolon, or is an S-expression list at top-level that starts with the comment symbol.

Copyright (c) 2009 The MITRE Corporation. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, this copyright notice and the title of the publication and its date appear, and notice in given that copying is by permission of The MITRE Corporation.