Artificial Intelligence Markup Language (AIML) Version 1.0.1
A.L.I.C.E. AI Foundation Working Draft 18 February 2005 (rev
007)
This version:
WD-aiml-1.0.1-20050218-007.html
Latest version:
http://aitools.org/aiml/spec
Previous versions:
WD-aiml-1.0.1-20011025-006.html
WD-aiml-1.0.1-20010218-007.html
WD-aiml-1.0.1-20011018-005.html
WD-aiml-1.0-20010926-004.html
WD-aiml-1.0-20010926-003.html
Creator of AIML:
Richard Wallace
Editor of this Document:
Richard Wallace
Original Author of this Document(*):
Noel Bush
Contributors to source material for this document:
Thomas Ringate, Anthony Taylor, Jon Baer, Noel Bush(*)
(*) - Note: As of 2/2005, most of the material in this document
was written by former A. I. Foundation member Noel Bush. His outstanding contribution to this
documentation and standardization effort is greatly appreciated by the A. I. Foundation and
by members of the A.L.I.C.E. AI community.
Copyright © A.L.I.C.E. AI Foundation, All Rights Reserved. A.L.I.C.E. AI
Foundation software licensing rules apply.
Abstract
The Artificial Intelligence Markup Language is a derivative of XML (Extensible Markup
Language) that is completely described in this document. Its goal is to enable pattern-based,
stimulus-response knowledge content to be served, received and processed on the Web and
offline in the manner that is presently possible with HTML and XML. AIML has been designed
for ease of implementation, ease of use by newcomers, and for interoperability with XML and
XML derivatives such as XHTML.
Status of this Document
This document has been drafted for review by theAlicebot and AIML Architecture
Committee of theA.L.I.C.E. AI Foundation, with the intention of providing
a more thorough specification than that laid out in the "AIML 1.0 Tag Set".
It is being released by the Committee as a Working Draft to gather public feedback before its
promotion to a Proposal, and eventually to final release as the AIML 1.0.1 Recommendation.
NB: Contents of this document are subject to change! This document should not be used
as reference material or cited as a normative reference from another document.
This is not a signficantly new version of AIML (first published 3 August 2001 at http://alicebot.org/committees/architecture/resolutions/aiml10.html); rather, it
incorporates small amendments made by the Architecture Committee during the period of
August-October 2001 and provides a more formal statement of the specification than has been
available thus far. According to the Committee's currentspecification release process, the incorporation of these changes requires that the
specification be considered a new specification; the addition of the subminor
".1" to "1.0" reflects this.
Please report errors in this document to drwallace@alicebot.org.
Table of Contents
1. Introduction
1.1. Origin and Goals
1.2. Terminology
2. AIML Objects
2.1. Well-formed AIML Objects
2.2. Characters
2.3. Common Syntactic Constructs
2.4. Character Data and Markup
2.5. Comments
2.6. Processing Instructions
2.7. CDATA Sections
2.8. Prolog and Document Type Declaration
2.9. Standalone Document Declaration
2.10. White Space Handling
2.11. End-of-Line Handling
2.12. Language Identification
3. AIML Object Structure
3.1. AIML Namespace
3.2. AIML Element
3.3. Forward-Compatible Processing
3.4. AIML Pattern Expressions
3.5. AIML Predicates
3.6. Embedding AIML
4. Topic
5. Category
6. Pattern
6.1. Pattern-side That
7. Template
7.1. Atomic template elements
7.1.1. Star
7.1.2. Template-side That
7.1.3. Input
7.1.4. Thatstar
7.1.5. Topicstar
7.1.6. Get
7.1.6.1. Bot
7.1.7. Short-cut elements
7.1.8. System-defined predicates
7.2. Text formatting elements
7.2.1. Uppercase
7.2.2. Lowercase
7.2.3. Formal
7.2.4. Sentence
7.3. Conditional elements
7.3.1. Condition
7.3.2. Random
7.4. Capture elements
7.4.1. Set
7.4.2. Gossip
7.5. Symbolic Reduction elements
7.5.1. SRAI
7.6. Transformational elements
7.6.1. Person2
7.6.2. Person
7.6.3. Gender
7.7. Covert elements
7.7.1. Think
7.7.2. Learn
7.8. External processor elements
7.8.1. System
7.8.2. JavaScript
8. AIML Pattern Matching
8.1. Pattern Expression syntax
8.1.1. Simple pattern expressions
8.1.2. Mixed pattern expressions
8.1.3. Normal words
8.1.4. Normal characters
8.1.5. AIML wildcards
8.1.6. Pattern-side bot elements
8.2. Load-time match path
construction
8.3. Input normalization
8.3.1. Substitution normalizations
8.3.2. Sentence-splitting
normalizations
8.3.3. Pattern-fitting normalizations
8.3.4. Normalization examples
8.4. Pattern Expression matching
behavior
8.4.1. Explanation
via implementation description: Graphmaster
9. AIML Predicate handling
9.1. AIML Predicate name syntax
9.2. AIML Predicate behaviors
9.3. AIML Predicate defaults
10. AIML Schema
11. References
12. Version History
Artificial Intelligence Markup Language, abbreviated AIML, describes a class of data
objects called AIML objects and partially describes the behavior of computer programs that
process them. AIML is a derivative of XML, theExtensible Markup Language. By construction, AIML objects are conforming XML documents,
although AIML objects may also be contained within XML documents. As XML is itself an
application profile or restricted form of SGML, the Standard Generalized Markup Language [ISO 8879],
AIML objects are also conforming SGML documents.
AIML objects are made up of units called topics and categories, which contain either
parsed or unparsed data. Parsed data is made up of characters, some of which form character
data, and some of which form AIML elements. AIML elements encapsulate the stimulus-response
knowledge contained in the document. Character data within these elements is sometimes parsed
by an AIML interpreter, and sometimes left unparsed for later processing by a Responder.
[Definition: A software module called an AIML interpreter is used to read AIML
objects and provide application-level functionality based on their structure. An AIML
interpreter may use the services of an XML processor, or it may take the place of one, but it
must not violate any of the constraints defined for XML processors.] [Definition: It is
assumed that an AIML interpreter is part of a larger application generically termed a
bot, which carries the larger functional set of interaction based on AIML. This
document does not constrain the specific behavior of a bot.] [Definition: A software module
called a responder handles the human-to-bot or bot-to-bot interface work between an
AIML interpreter and its object(s).]
AIML was developed by Dr. Richard S. Wallace and the Alicebot free software community
during 1995-2000. It was originally adapted from a non-XML grammar also called AIML, and
formed the basis for the first Alicebot, A.L.I.C.E., the Artificial Linguistic Internet
Computer Entity. Since its inception, it has been adopted as a standard by the A.L.I.C.E. AI
Foundation, which now holds its copyright, and whose Alicebot and AIML Architecture Committee
is responsible for its maintenance and further elaboration.
The design goals for AIML are:
- AIML shall be easy for people to learn.
- AIML shall encode the minimal concept set necessary to enable a stimulus-response
knowledge system modeled on that of the original A.L.I.C.E.
- AIML shall be compatible with XML.
- It shall be easy to write programs that process AIML documents.
- AIML objects should be human-legible and reasonably clear.
- The design of AIML shall be formal and concise.
- AIML shall not incorporate dependencies upon any other language.
This specification provides all the information necessary to understand AIML Version 1.0.1
and construct computer programs to process it.
This version of the AIML specification may be distributed freely, as long as all text and
legal notices remain intact.
Like much of this document, the terminology used in this document to describe AIML objects
is borrowed freely from the W3C's XML Recommendation (http://www.w3.org/TR/REC-xml). The following terms,
copied almost verbatim from the XML Recommendation, are used in building those definitions
and in describing the actions of an AIML interpreter:
may
[Definition: Conforming objects and AIML interpreters are permitted
to but need not behave as described.]
must
[Definition: Conforming objects and AIML interpreters are required
to behave as described; otherwise they are in error.]
error
[Definition: A violation of the rules of this specification;
results are undefined. Conforming software may detect and report an error and may recover
from it.]
fatal error
[Definition: An error which a conforming AIML interpreter must
detect and report to the bot. After encountering a fatal error, the interpreter may continue
processing the data to search for further errors and may report such errors to the bot. In
order to support correction of errors, the interpreter may make unprocessed data from the
object (with intermingled character data and AIML content) available to the bot. Once a fatal
error is detected, however, the interpreter must not continue normal processing (i.e., it
must not continue to pass character data and information about the object's logical
structure to the bot in the normal way).]
at user option
[Definition: Conforming software may or must (depending on the
modal verb in the sentence) behave as described; if it does, it must provide users a means to
enable or disable the behavior described.]
validity constraint
[Definition: A rule that applies to all valid AIML objects.
Violations of validity constraints are errors; they must, at user option, be reported by
validating AIML interpreters.
well-formedness constraint
[Definition: A rule that applies to all well-formed XML documents,
and hence to all well-formed AIML objects. Violations of well-formedness constraints are
fatal errors, as in the XML specification.]
for compatibility
[Definition: Marks a sentence describing a feature of AIML included
solely to ensure that AIML remains compatible with XML.]
[Definition: A data object is an AIML object if it is well-formed, as defined in
the XML specification, and if it is valid according to the AIML specification--that is, if it
conforms to the data model described in this document.]
Each AIML object has both a logical and a physical structure. Physically, the object is
composed of units called topics and categories. An object begins in a
"root" or object entity. Logically, the object is composed of elements and
character references, all of which are indicated in the object by explicit markup.
An AIML object may also be "overlaid" by comments and processing
instructions as described by the XML specification, as well as by XML content from other
namespaces. Comments and processing instructions are not treated by an AIML interpreter.
Foreign-namespace content may be passed by an AIML interpreter to a responder, but is not
processed by the AIML interpreter itself.
In the following, we reiterate the AIML definitions of various aspects of object
structure, although in the great majority of cases these simply repeat the XML definitions of
the same. In several cases, there are further constraints on AIML interpreters that do not
apply to all XML processors.
[Definition: A textual object is a well-formed AIML object if it meets the criteria
specified in this document and matches the definition of XML well-formedness (http://www.w3.org/TR/2000/REC-xml-20001006#sec-well-formed).]
The AIML definitions for text and characters are adopted verbatim from the
XML specification: http://www.w3.org/TR/2000/REC-xml-20001006#charsets.
This document will use the same symbols used in the XML specification: http://www.w3.org/TR/2000/REC-xml-20001006#sec-common-syn.
AIML uses the same definitions of character data and markup as used in the
XML specification: http://www.w3.org/TR/2000/REC-xml-20001006#syntax.
AIML respects the same definition of comments as in XML: http://www.w3.org/TR/2000/REC-xml-20001006#sec-comments.
However, an AIML interpreter must not make it possible for a bot to retrieve the
text of comments. Comments are regarded as a "wholly XML" feature, not at
all part of the functional set of AIML interpretation.
AIML respects the same definition of processing instructions as in XML: http://www.w3.org/TR/2000/REC-xml-20001006#sec-pi. AIML does not make special use of
the processing instruction function.
As in XML (http://www.w3.org/TR/2000/REC-xml-20001006#sec-cdata-sect), CDATA sections may
occur anywhere character data may occur in AIML, and are used to escape blocks of text
containing characters that would otherwise be recognized as markup.
However, CDATA sections must not be used as a substitute for proper namespace
qualification; that is, CDATA sections must not contain markup from other namespaces that is
meant to be interpreted later by a responder.
AIML adds no further constraints on the XML specification of prologs (http://www.w3.org/TR/2000/REC-xml-20001006#sec-prolog-dtd).
Since AIML objects can appear within other XML documents, which themselves may or may not
have prologs, AIML interpreters cannot require a prolog to be present.
AIML does not use Document Type Declarations. Instead, AIML uses W3C Schemas. This
document may be understood as the description of the AIML schema. AIML objects are not
required to refer to an AIML schema namespace; however, this is strongly advised since the
AIML specification is subject to change and namespace identification will aid AIML
interpreters in deciding upon proper interpretation of AIML objects.
AIML shares the XML definition of an external markup declaration (http://www.w3.org/TR/2000/REC-xml-20001006#sec-rmd). AIML interpreters should make use
of such declarations in interpreting AIML objects, and should pass along appropriate
declaration information to responders.
AIML interpreters must respect the entirety of the XML definition of whitespace handling
(http://www.w3.org/TR/2000/REC-xml-20001006#sec-white-space), including the use of the
special attribute xml:space.
The default behavior of AIML interpreters when handling whitespace must be:
- in cases where elements abut character data or other elements, with no intervening
whitespace, to preserve this arrangement in the output;
- in cases where at least one whitespace character in the AIML object intervenes between
an element and character data or another element, to produce exactly one space character
regardless of how many whitespace characters are present in the document, unless the
xml:space attribute is applied to the parent element;
- in cases where the xml:space is applied to an
element, to respect the precise whitespace content of the element.
AIML interpreters should respect the end-of-line handling requirements of the XML
specification (http://www.w3.org/TR/2000/REC-xml-20001006#sec-line-ends).
AIML objects may make use of the XML attribute xml:lang for identifying the natural or formal language in which the content of an
object is written. (See http://www.w3.org/TR/2000/REC-xml-20001006#sec-lang-tag).
AIML is meant to be independent of human language, and thus the optional XML method of
providing such information is the only means available to AIML authors; no additional
entities support this kind of explicit markup.
The AIML 1.0.1 namespace (herein referred to as "the AIML namespace")
has the URIhttp://alicebot.org/2001/AIML-1.0.1.
AIML interpreters must use the XML namespaces mechanism [XML Names] to recognize
elements and attributes from this namespace. The complete list of AIML-defined elements is
defined in this document. Vendors must not extend the AIML namespace with additional elements
or attributes. Instead, any extension must be in a separate namespace.
This specification uses a prefix of aiml: for
referring to elements in the AIML namespace. However, AIML objects are free to use any
prefix, provided that there is a namespace declaration that binds the prefix to the URI of
the AIML namespace.
An element from the AIML namespace may have any attribute not from the AIML namespace,
provided that the expanded-name of the attribute has a non-null namespace URI. The presence
of such attributes must not change the behavior of AIML units and functions described in this
document. Thus, an AIML interpreter is always free to ignore such attributes, and must ignore
such attributes without giving an error if it does not recognize the namespace URI. Such
attributes can provide, for example, unique identifiers, optimization hints, or
documentation.
It is an error for an element from the AIML namespace to have attributes with
expanded-names that have null namespace URIs (i.e., attributes with unprefixed names) other
than attributes defined for the element in this document.
NOTE: The conventions used for the names of AIML elements and attributes are that
names are all lower-case, use hyphens to separate words, and use abbreviations only in
support of historical usage (as in <li>).
<aiml:aiml
version = number>
<!-Content:
top-level-elements) -->
</aiml:aiml>
An AIML object is represented by an aiml:aiml element
in an XML document.
An AIML object must have a version attribute,
indicating the version of AIML that the object requires. For this version of AIML, the
version should be 1.0.1. When the value is not equal to
1.0.1,forward-compatible processing mode is enabled.
The aiml:aiml element may contain the following types
of elements:
An element occurring as a child of an aiml:aiml
element is called a top-level element.
This example shows the top-level structure of an AIML object. Ellipses (...) indicate where attribute values or content have been
omitted. AIML objects may contain zero or more of each of these elements.
<aiml:aiml version="1.0.1"
xmlns:aiml="http://alicebot.org/2001/AIML">
<aiml:topic
name="...">
...
</aiml:topic>
<aiml:category>
...
</aiml:category>
</aiml:aiml>
That is, an AIML object must contain zero or more topic elements, and one or more category
elements.
The order in which the children of the aiml:aiml
element occur is unconstrained (at user option). Users are free to order the elements as they
prefer.
In addition, the aiml:aiml element may contain any
element not from the AIML namespace, provided that the expanded-name of the element has a
non-null namespace URI. The presence of such top-level elements must not change the behavior
of AIML elements defined in this document. An AIML interpreter must always skip
interpretation of such top-level elements, but must also pass these elements along to the
responder. Such elements might be, for example, formatting elements from a namespace such
asXHTML,SMIL, orXSL-FO.
An element enables forward-compatible mode for itself, its attributes, its descendants and
their attributes if it is an aiml:aiml element whose
version attribute is not equal to 1.0.1.
If an element is processed in forward-compatible mode, then:
- if it is a top-level element and AIML 1.0.1 does not allow such elements as top-level
elements, then the element must be ignored along with its content;
- if it is an element in a category and AIML 1.0.1
does not allow such elements to occur in categories, then the element should be ignored;
- if the element has an attribute that AIML 1.0.1 does not allow the element to have or if
the element has an optional attribute with a value that the AIML 1.0.1 does not allow the
attribute to have, then the attribute must be ignored.
Thus, any AIML 1.0.1 interpreter must be able to process the following AIML object without
error (but also without producing any behavior in response to the non-1.0.1 element):
<aiml:aiml version="1.1"
xmlns:aiml="http://alicebot.org/2001/AIML">
<aiml:topic name="demonstration
*">
<aiml:category>
<aiml:pattern>* EXAMPLE</aiml:pattern>
<aiml:template>
This is just an example, <get name="username"/>.
<aiml:exciting-new-1.1-feature/>
</aiml:template>
</aiml:category>
</aiml:topic>
Several AIML elements and element attributes require a restricted form of mixed character
data and optional restricted element content that is called an AIML Pattern
Expression. There are two types of pattern expressions: simple pattern expressions
and mixed pattern expressions. Mixed pattern expressions cannot be fully described by
either Document Type Declaration syntax or W3C Schema 1.0.1 syntax. The structure of AIML
pattern expressions is given in [8.1].
Many AIML elements attributes deal with predicates. The definition of an AIML
predicate is found in [9], and the restrictions on
predicate name are given in [9.1].
A special class of AIML predicates are bot predicates, which have the same name
restrictions as all AIML predicates and whose values can only be retrieved, not set, at
runtime. The values of bot predicates are set at load time.
An AIML object may be a standalone XML document with the aiml:aiml element as the document element, or it may be embedded in another resource.
Two forms of embedding are possible:
- the AIML object may be textually embedded in a non-AIML resource, or
- the aiml:aiml element may occur in an XML document
other than as the document element
AIML 1.0.1 does not currently permit an id
attribute on the aiml:aiml element to facilitate the
second form of embedding, such as XSLT provides (see http://www.w3.org/TR/xslt#section-Embedding-Stylesheets).
The following example shows how an XHTML document can contain an AIML object, such that an
AIML interpreter could read the AIML object (including the "overlaid" XHTML
content with which it coincides):
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=http://www.w3.org/1999/xhtml
xmlns:aiml=http://alicebot.org/2001/AIML xml:lang="en-US">
<head>
<title>Our
Company</title>
</head>
<body>
<aiml:aiml>
<aiml:category>
<aiml:pattern>
<h1>About
Our Company</h1>
</aiml:pattern>
<aiml:template>
<p>
We
sell genetically engineered worms that clean the sidewalk and communicate via microwave
transmissions.
</p>
</aiml:template>
</aiml:category>
</aiml:aiml>
</body>
</html>
(Note that indentation in this example is provided purely for clarity, but white space
should be handled as described in [2.10.].)
A topic is an optional top-level element that contains
category elements. A topic element has a required name attribute that
must contain a simple pattern expression. A topic element
may contain one or more category elements.
The contents of the topic element's name attribute are appended to the full match path that is
constructed by the AIML interpreter at load time, as described in [8.2].
<!-- Category: top-level-element -->
<aiml:topic name =
aiml-simple-pattern-expression >
<!-- Content: aiml:category+
-->
</aiml:topic>
A category is a top-level (or second-level, if
contained within a topic) element that contains exactly
one pattern and exactly one template. A category does not have any
attributes.
All category elements that do not occur as children of
an explicit topic element must be assumed by the AIML
interpreter to occur as children of an "implied" topic whose name attribute
has the value * (single asterisk wildcard).
<!-- Category: top-level-element -->
<aiml:category>
<!-- Content:
aiml-category-elements -->
</aiml:category>
A pattern is an element whose content is a mixed
pattern expression. Exactly one pattern must appear in
each category. The pattern must always be the first child element of the category. A pattern does not have any attributes.
The contents of the pattern are appended to the full
match path that is constructed by the AIML interpreter at load time, as described in [8.2].
<!-- Category: aiml-category-elements -->
<aiml:pattern>
<!-- Content:
aiml-pattern-expression -->
</aiml:pattern>
The pattern-side that element is a special type of
pattern element used for context matching. The pattern-side that is optional in a category, but if it occurs it must occur no more than once, and
must immediately follow the pattern and immediately
precede the template. A pattern-side that element contains a simple pattern expression.
The contents of the pattern-side that are appended to
the full match path that is constructed by the AIML interpreter at load time, as described in
[8.2].
If a category does not contain a pattern-side that,
the AIML interpreter must assume an "implied" pattern-side that containing the pattern expression * (single asterisk wildcard).
A pattern-side that element has no attributes.
<!-- Category: aiml-category-elements -->
<aiml:that>
<!-- Content:
aiml-pattern-expression -->
</aiml:that>
A template is an element that appears within category elements. The template must follow the pattern-side that
element, if it exists; otherwise, it follows the pattern
element. A template does not have any attributes.
<!-- Category: aiml-category-elements -->
<aiml:template>
<!-- Content:
aiml-template-elements -->
</aiml:template>
The majority of AIML content is within the template.
The template may contain zero or more AIML template
elements mixed with character data. The elements described below are grouped for convenience.
An atomic template element in AIML indicates to an AIML interpreter that it must return a
value according to the functional meaning of the element. Atomic elements do not have any
content.
The star element indicates that an AIML interpreter
should substitute the value "captured" by a particular wildcard from the
pattern-specified portion of the match path when
returning the template.
The star element has an optional integer index attribute that indicates which wildcard to use. The
minimum acceptable value for the index is "1" (the first wildcard), and the maximum
acceptable value is equal to the number of wildcards in the pattern.
An AIML interpreter should raise an error if the index
attribute of a star specifies a wildcard that does not
exist in the category element's pattern. Not specifying the index is the same as specifying an index of "1".
The star element does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:star index =
single-integer-index />
The template-side that element indicates that an AIML
interpreter should substitute the contents of a previous bot output.
The template-side that has an optional index attribute that may contain either a single integer or a
comma-separated pair of integers. The minimum value for either of the integers in the index is "1". The index tells the AIML
interpreter which previous bot output should be returned (first dimension), and
optionally which "sentence" (see [8.3.2.]) of the previous bot output
(second dimension).
The AIML interpreter should raise an error if either of the specified index dimensions is invalid at run-time.
An unspecified index is the equivalent of "1,1". An unspecified second dimension of
the index is the equivalent of specifying a "1" for the second dimension.
The template-side that element does not have any
content.
<!-- Category: aiml-template-elements -->
<aiml:that index =
(single-integer-index | comma-separated-integer-pair) />
The input element tells the AIML interpreter that it
should substitute the contents of a previous user input.
The template-side input has an optional index attribute that may contain either a single integer or a
comma-separated pair of integers. The minimum value for either of the integers in the index is "1". The index tells the AIML
interpreter which previous user input should be returned (first dimension), and
optionally which "sentence" (see [8.3.2.]) of the previous user input.
The AIML interpreter should raise an error if either of the specified index dimensions is invalid at run-time.
An unspecified index is the equivalent of "1,1". An unspecified second dimension of
the index is the equivalent of specifying a "1" for the second dimension.
The input element does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:input index =
(single-integer-index | comma-separated-integer-pair) />
The thatstar element tells the AIML interpreter that
it should substitute the contents of a wildcard from a pattern-side that element.
The thatstar element has an optional integer index attribute that indicates which wildcard to use; the
minimum acceptable value for the index is "1" (the first wildcard).
An AIML interpreter should raise an error if the index
attribute of a star specifies a wildcard that does not
exist in the that element's pattern content. Not
specifying the index is the same as specifying an index of "1".
The thatstar element does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:thatstar index =
single-integer-index />
The topicstar element tells the AIML interpreter that
it should substitute the contents of a wildcard from the current topic (if the topic contains any wildcards).
The topicstar element has an optional integer index attribute that indicates which wildcard to use; the
minimum acceptable value for the index is "1" (the first wildcard). Not specifying the
index is the same as specifying an index of "1".
The topicstar element does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:topicstar index =
single-integer-index />
The get element tells the AIML interpreter that it
should substitute the contents of a predicate, if that predicate has a value defined. If the
predicate has no value defined, the AIML interpreter should substitute the empty string "".
The AIML interpreter implementation may optionally provide a mechanism that allows the
AIML author to designate default values for certain predicates (see [9.3.]).
The get element must not perform any text formatting
or other "normalization" on the predicate contents when returning them.
The get element has a required name attribute that identifies the predicate with an AIML
predicate name.
The get element does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:get name =
aiml-predicate-name />
An element called bot, which may be considered a
restricted version of get, is used to tell the AIML
interpreter that it should substitute the contents of a "bot predicate".
The value of a bot predicate is set at load-time, and cannot be changed at run-time. The AIML
interpreter may decide how to set the values of bot predicate at load-time. If the bot
predicate has no value defined, the AIML interpreter should substitute an empty string.
The bot element has a required name attribute that identifies the bot predicate.
The bot element does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:bot name =
aiml-predicate-name />
Several atomic AIML elements are "short-cuts" for combinations of other
AIML elements. They are listed here without further explanation; the reader should refer to
the descriptions of the "long form" of each element for which the following
elements are short-cuts.
The sr element is a shortcut for:
<srai><star/></srai>
The atomic sr does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:sr/>
7.1.7.2. Person2
The atomic version of the person2 element is a
shortcut for:
<person2><star/></person2>
The atomic person2 does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:person2/>
7.1.7.3. Person
The atomic version of the person element is a shortcut
for:
<person><star/></person>
The atomic person does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:person/>
7.1.7.4. Gender
The atomic version of the gender element is a shortcut
for:
<gender><star/></gender>
The atomic gender does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:gender/>
NB: Previous versions of AIML (dubbed "0.9") used
<aiml:gender/> to indicate that the AIML interpreter should return the value of
the bot predicate "gender". Vendors who wish to implement an AIML
interpreter that is compatible with old AIML sets written with this usage should take note.
Several atomic AIML elements require the AIML interpreter to substitute a value that is
determined from the system, independently of the AIML content.
7.1.8.1. Date
The date element tells the AIML interpreter that it
should substitute the system local date and time. No formatting constraints on the output are
specified.
The date element does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:date/>
7.1.8.2. ID
The id element tells the AIML interpreter that it
should substitute the user ID. The determination of the user ID is not specified, since it
will vary by application. A suggested default return value is "localhost".
The id element does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:id/>
7.1.8.3. Size
The size element tells the AIML interpreter that it
should substitute the number of categories currently loaded.
The size element does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:size/>
7.1.8.4. Version
The version element tells the AIML interpreter that it
should substitute the version number of the AIML interpreter.
The version element does not have any content.
<!-- Category: aiml-template-elements -->
<aiml:version/>
AIML contains several text-formatting elements, which instruct an AIML interpreter to
perform locale-specific post-processing of the textual results of the processing of their
contents.
The uppercase element tells the AIML interpreter to
render the contents of the element in uppercase, as defined (if defined) by the locale
indicated by the specified language (if specified).
<!-- Category: aiml-template-elements -->
<aiml:uppercase>
<!-- Contents:
aiml-template-elements -->
</aiml:uppercase>
If no character in this string has a different uppercase version, based on the Unicode
standard, then the original string is returned.
SeeUnicode Case
Mapping for implementation suggestions.
The lowercase element tells the AIML interpreter to
render the contents of the element in lowercase, as defined (if defined) by the locale
indicated by the specified language (if specified).
<!-- Category: aiml-template-elements -->
<aiml:lowercase>
<!-- Content:
aiml-template-elements -->
</aiml:lowercase>
If no character in this string has a different lowercase version, based on the Unicode
standard, then the original string is returned.
SeeUnicode Case
Mapping for implementation suggestions.
The formal element tells the AIML interpreter to
render the contents of the element such that the first letter of each word is in uppercase,
as defined (if defined) by the locale indicated by the specified language (if specified).
This is similar to methods that are sometimes called "Title Case".
<!-- Category: aiml-template-elements -->
<aiml:formal>
<!-- Content:
aiml-template-elements -->
</aiml:formal>
If no character in this string has a different uppercase version, based on the Unicode
standard, then the original string is returned.
SeeUnicode Case
Mapping for implementation suggestions.
The sentence element tells the AIML interpreter to
render the contents of the element such that the first letter of each sentence is in
uppercase, as defined (if defined) by the locale indicated by the specified language (if
specified). Sentences are interpreted as strings whose last character is the period or
full-stop character .. If the string does not contain a
., then the entire string is treated as a sentence.
<!-- Category: aiml-template-elements -->
<aiml:sentence>
<!-- Content:
aiml-template-elements -->
</aiml:sentence>
If no character in this string has a different uppercase version, based on the Unicode
standard, then the original string is returned.
SeeUnicode Case
Mapping for implementation suggestions.
AIML contains several conditional elements, which return values based on specified
conditions.
The condition element instructs the AIML interpreter
to return specified contents depending upon the results of matching a predicate against a
pattern.
NB: The condition element has three different
types. The three different types specified here are distinguished by an xsi:type attribute, which permits a validating XML Schema
processor to validate them. Two of the types may contain li elements, of which there are three different types, whose validity is determined
by the type of enclosing condition. In practice, an AIML
interpreter may allow the omission of the xsi:type
attribute and may instead heuristically determine which type of condition (and hence li) is in use.
7.3.1.1. Block Condition
The blockCondition type of condition has a required attribute name, which specifies an AIML predicate, and a required
attribute value, which contains a simple pattern
expression.
<!-- Category: aiml-template-elements -->
<aiml:condition xsi:type =
"blockCondition" name =
aiml-predicate-name
value = aiml-simple-pattern-expression >
<!-- Contents:
aiml-template-elements -->
</aiml:condition>
If the contents of the value attribute match the value
of the predicate specified by name, then the AIML
interpreter should return the contents of the condition.
If not, the empty string "" should be
returned.
7.3.1.2. Single-predicate Condition
The singlePredicateCondition type of condition has a required attribute name, which specifies an AIML predicate. This form of condition must contain at least one li element. Zero or more of these li elements may be of the valueOnlyListItem type.
Zero or one of these li elements may be of the defaultListItem type.
<!-- Category: aiml-template-elements -->
<aiml:condition xsi:type =
"singlePredicateCondition" name =
aiml-predicate-name >
<!-- Contents:
value-only-list-item*, default-list-item{0,1} -->
</aiml:condition>
The singlePredicateCondition type of condition is processed as follows:
Reading each contained li in order:
- If the li is a valueOnlyListItem type, then compare the contents of the value attribute of the li
with the value of the predicate specified by the name
attribute of the enclosing condition.
- If they match, then return the contents of the li
and stop processing this condition.
- If they do not match, continue processing the condition.
- If the li is a defaultListItem type, then return the contents of the li and stop processing this condition.
7.3.1.3. Multi-predicate Condition
The multiPredicateCondition type of condition has no attributes. This form of condition must contain at least one li element. Zero or more of these li elements may be of the nameValueListItem type.
Zero or one of these li elements may be of the defaultListItem type.
<!-- Category: aiml-template-elements -->
<aiml:condition xsi:type =
"multiPredicateCondition">
<!-- Contents:
name-value-list-item*, default-list-item{0,1} -->
</aiml:condition>
The multiPredicateCondition type of condition is processed as follows:
Reading each contained li in order:
- If the li is a nameValueListItem type, then compare the contents of the value attribute of the li
with the value of the predicate specified by the name
attribute of the li.
- If they match, then return the contents of the li
and stop processing this condition.
- If they do not match, continue processing the condition.
- If the li is a defaultListItem type, then return the contents of the li and stop processing this condition.
7.3.1.4. Condition List Items
As described above, two types of condition may contain
li elements. There are three types of li elements. The type of li element allowed in a given condition depends
upon the type of that condition, as described above.
7.3.1.4.1. Default List Items
An li element of the type defaultListItem has no attributes. It may contain any AIML template elements.
<!-- Category: condition-list-item -->
<aiml:li xsi:type =
"defaultListItem">
<!-- Contents:
aiml-template-elements -->
</aiml:li>
7.3.1.4.2. Value-only List Items
An li element of the type valueOnlyListItem has a required attribute value,
which must contain a simple pattern expression. The element may contain any AIML template
elements.
<!-- Category: condition-list-item -->
<aiml:li xsi:type =
"valueOnlyListItem" value =
aiml-simple-pattern-expression >
<!-- Contents:
aiml-template-elements -->
</aiml:li>
7.3.1.4.3. Name and Value List Items
An li element of the type nameValueListItem has a required attribute name,
which specifies an AIML predicate, and a required attribute value, which contains a simple pattern expression. The element may contain any AIML
template elements.
<!-- Category: condition-list-item -->
<aiml:li xsi:type =
"nameValueListItem" name =
aiml-predicate-name
value = aiml-simple-pattern-expression >
<!-- Contents:
aiml-template-elements -->
</aiml:li>
The random element instructs the AIML interpreter to
return exactly one of its contained li elements randomly.
The random element must contain one or more li elements of type defaultListItem, and cannot contain any other elements.
<!-- Category: aiml-template-elements -->
<aiml:random>
<!-- Contents:
default-list-item+ -->
</aiml:random>
AIML defines two content-capturing elements, which tell the AIML interpreter to capture
their processed contents and perform some storage operation with them.
The set element instructs the AIML interpreter to set
the value of a predicate to the result of processing the contents of the set element. The set
element has a required attribute name, which must be a
valid AIML predicate name. If the predicate has not yet been defined, the AIML interpreter
should define it in memory.
The AIML interpreter should, generically, return the result of processing the contents of
the set element. The set element must not perform any text formatting or other
"normalization" on the predicate contents when returning them.
The AIML interpreter implementation may optionally provide a mechanism that allows the
AIML author to designate certain predicates as "return-name-when-set",
which means that a set operation using such a predicate
will return the name of the predicate, rather than its captured value. (See [9.2].)
A set element may contain any AIML template elements.
<!-- Category: aiml-template-elements -->
<aiml:set name =
aiml-predicate-name >
<!-- Contents:
aiml-template-elements -->
</aiml:set>
The gossip element instructs the AIML interpreter to
capture the result of processing the contents of the gossip elements and to store these contents in a manner left up to the
implementation. Most common uses of gossip have been to
store captured contents in a separate file.
The gossip element does not have any attributes. It
may contain any AIML template elements.
<!-- Category: aiml-template-elements -->
<aiml:gossip>
<!-- Contents:
aiml-template-elements -->
</aiml:gossip>
AIML defines one symbolic reduction element, called srai, as well as a short-cut element called sr.
The sr element is described in [7.1.7.1].
The srai element instructs the AIML interpreter to
pass the result of processing the contents of the srai
element to the AIML matching loop, as if the input had been produced by the user (this
includes stepping through the entireinput
normalization process). The srai element does not
have any attributes. It may contain any AIML template elements.
As with all AIML elements, nested forms should be parsed from inside out, so embedded
srais are perfectly acceptable.
<!-- Category: aiml-template-elements -->
<aiml:srai>
<!-- Contents:
aiml-template-elements -->
</aiml:srai>
AIML defines several transformational elements, which instruct the AIML interpreter to
transform the result of processing the contents of the transformational element into another
value according to a lookup table. The implementation of transformational elements is left up
to the implementation.
The person2 element instructs the AIML interpreter to:
- replace words with first-person aspect in the result of processing the contents of the
person2 element with words with the
grammatically-corresponding third-person aspect; and
- replace words with third-person aspect in the result of processing the contents of the
person2 element with words with the
grammatically-corresponding first-person aspect.
The definition of "grammatically-corresponding" is left up to the
implementation.
<!-- Category: aiml-template-elements -->
<aiml:person2>
<!-- Contents:
aiml-template-elements -->
</aiml:person2>
Historically, implementations of person2 have dealt
with pronouns, likely due to the fact that most AIML has been written in English. However,
the decision about whether to transform the person2 aspect of other words is left up to the
implementation.
The person element instructs the AIML interpreter to:
- replace words with first-person aspect in the result of processing the contents of the
person element with words with the
grammatically-corresponding second-person aspect; and
- replace words with second-person aspect in the result of processing the contents of the
person element with words with the
grammatically-corresponding first-person aspect.
The definition of "grammatically-corresponding" is left up to the
implementation.
<!-- Category: aiml-template-elements -->
<aiml:person>
<!-- Contents:
aiml-template-elements -->
</aiml:person>
Historically, implementations of person have dealt
with pronouns, likely due to the fact that most AIML has been written in English. However,
the decision about whether to transform the person aspect of other words is left up to the
implementation.
The gender element instructs the AIML interpreter to:
- replace male-gendered words in the result of processing the contents of the gender element with the grammatically-corresponding
female-gendered words; and
- replace female-gendered words in the result of processing the contents of the gender element with the grammatically-corresponding
male-gendered words.
The definition of "grammatically-corresponding" is left up to the
implementation.
<!-- Category: aiml-template-elements -->
<aiml:gender>
<!-- Contents:
aiml-template-elements -->
</aiml:gender>
Historically, implementations of gender have
exclusively dealt with pronouns, likely due to the fact that most AIML has been written in
English. However, the decision about whether to transform gender of other words is left up to
the implementation.
AIML defines two "covert" elements that instruct the AIML interpreter to
perform some processing on their contents, but to not return any value.
The think element instructs the AIML interpreter to
perform all usual processing of its contents, but to not return any value, regardless of
whether the contents produce output.
The think element has no attributes. It may contain
any AIML template elements.
<!-- Category: aiml-template-elements -->
<aiml:think>
<!-- Contents:
aiml-template-elements -->
</aiml:think>
The learn element instructs the AIML interpreter to
retrieve a resource specified by a URI, and to process its AIML object contents.
<!-- Category: aiml-template-elements -->
<aiml:learn>
<!-- Contents:
uri-reference -->
</aiml:learn>
AIML defines two external processor elements, which instruct the AIML interpreter to pass
the contents of the elements to an external processor. External processor elements may return
a value, but are not required to do so.
Contents of external processor elements may consist of character data as well as AIML
template elements. If AIML template elements in the contents of an external processor element
are not enclosed as CDATA, then the AIML interpreter is required to substitute the results of
processing those elements before passing the contents to the external processor. As a trivial
example, consider:
<system>
echo '<get
name="name"/>'
</system>
Before passing the contents of this system element to
the appropriate external processor, the AIML interpreter is required to substitute the
results of processing the get element.
AIML 1.0.1 does not require that any contents of an external processor element are
enclosed as CDATA. An AIML interpreter should assume that any unrecognized content is
character data, and simply pass it to the appropriate external processor as-is, following any
processing of AIML template elements not enclosed as CDATA.
If an external processor is not available to process the contents of an external processor
element, the AIML interpreter may return an error, but this is not required.
The system element instructs the AIML interpreter to
pass its content (with any appropriate preprocessing, as noted above) to the system command
interpreter of the local machine on which the AIML interpreter is running. The system element does not have any attributes.
<!-- Category: aiml-template-elements -->
<aiml:system>
<!-- Contents: character
data, aiml-template-elements -->
</aiml:system>
The system element may return a value.
The javascript element instructs the AIML interpreter
to pass its content (with any appropriate preprocessing, as noted above) to a server-side
JavaScript interpreter on the local machine on which the AIML interpreter is running. The
javascript element does not have any attributes.
<!-- Category: aiml-template-elements -->
<aiml:javascript>
<!-- Contents: character
data, aiml-template-elements -->
</aiml:javascript>
AIML 1.0.1 does not require that an AIML interpreter include a server-side JavaScript
interpreter, and does not require any particular behavior from the server-side JavaScript
interpreter if it exists.
The javascript element may return a value.
An AIML Pattern Expression is a restricted form of mixed character data and optional
restricted element content.
There are two forms of AIML pattern expressions: simple pattern expressions, and
mixed pattern expressions.
[Definition: A simple pattern expression is composed from one or moresimple pattern expression
constituents, separated by XML spaces (#x20).]
|
For all simple pattern expression constituents C, and
for all simple pattern expressions E, valid simple pattern expressions V
are:
|
Denoting the set of strings
L(V) containing:
|
|
C
|
all strings in L(C)
|
|
E
|
all strings in L(E)
|
|
C E
|
all strings in L(C) and all strings in L(E)
|
|
E C
|
all strings in L(E) and all strings in L(C)
|
[Definition: A simple pattern
expression constituent is either anormal word or
anAIML wildcard.
[Definition: A mixed pattern expression is composed from one or moremixed pattern expression
constituents, separated by XML spaces (#x20).]
|
For all mixed pattern expression constituents C, and
for all mixed pattern expressions E, valid mixed pattern expressions V
are:
|
Denoting the set of strings
L(V) containing:
|
|
C
|
all strings in L(C)
|
|
E
|
all strings in L(E)
|
|
C E
|
all strings in L(C) and all strings in L(E)
|
|
E C
|
all strings in L(E) and all strings in L(C)
|
[Definition: A mixed pattern
expression constituent is either anormal word, anAIML wildcard, or apattern-side bot element.]
[Definition: A normal word consists of one or morenormal characters, concatenated together.]
[Definition: A normal character is any "uppercase
letter"*, any"caseless letter"**, or anydigit.]
* for "bicameral" alphabets, in which "uppercase" has
meaning (Latin, Greek, Cyrillic, Armenian, and archaic Georgian). See [Unicode Case
Mapping] for more information.
** for "unicameral" alphabets, in which "case" has no
meaning (seereference above).
A normal character can be represented either as itself, or with a character reference.
[Definition: An AIML wildcard is one of * or
_, which have the meanings defined in the following
table.]
|
For all normal words L(N), valid AIML wildcards
W are:
|
Denoting the set of strings
L(W) containing:
|
|
*
|
any string in L(N)
|
|
_
|
any string in L(N)
|
[Definition: A pattern-side bot element consists of any AIMLbot element.]
An AIML interpreter must construct a match path from each category at load-time. It is recommended, but not required,
that all match paths be compressed into an efficient trie structure such as the Graphmaster
(see [8.4.1]).
A match path has three components, whose order is mandatory:
-
pattern: the contents of the category element's
pattern element
-
that: the contents of the category element's
pattern-side that element
NB: If the category does not contain a
pattern-side that, then the AIML interpreter must
insert an "implied" that match path
component with value * (single asterisk wildcard).
-
topic: the contents of the name attribute of the
topic element parent of the category element.
NB: If the category element does not have an
explicit topic parent, then the AIML interpreter must
insert an "implied" topic match path
component with value with value * (single asterisk
wildcard).
A conventional way to construct the match path is as a simple string, in which the markers
<that> and <topic> are used to separate the pattern, that and topic components from one another. For example, given this topic/category:
<topic name="* A">
<category>
<pattern>_ C
*</pattern>
<that>B
*</that>
<template>
...
</template>
</category>
</topic>
the match path string might look like:
_ C * <that> B * <topic> * A
(The convention of using angle-bracketed markers such as <that> and
<topic> is not to be confused with XML markup.)
Following the same convention, and by way of illustrating the requirement of inserting
"implied" that and/or topic components where they are not explicit, here are three
more examples.
Given this topic/category:
<topic name="* A">
<category>
<pattern>_ C
*</pattern>
<template>
...
</template>
</category>
</topic>
the match path string would look like ("implied" that):
_ C * <that> * <topic> * A
Given this category:
<category>
<pattern>_ C
*</pattern>
<that>B *</that>
<template>
...
</template>
</category>
the match path string would look like ("implied" topic):
_ C * <that> B * <topic> *
And given this category:
<category>
<pattern>_ C
*</pattern>
<template>
...
</template>
</category>
the match path string would look like ("implied" that
and
topic):
_ C * <that> * <topic> *
At run-time, an AIML interpreter must match normalized inputs against match paths that are
ordered as described in [8.4].
An AIML interpreter must perform a "normalization" function on all
inputs before attempting to match. The minimum set of normalizations is called
pattern-fitting normalizations. Additional normalizations performed at user option
are called sentence-splitting normalizations and substitution normalizations
(or just "substitutions").
If an AIML interpreter performs substitution normalizations on the input, then these must
be performed first.
If an AIML interpreter performs sentence-splitting normalizations on the input, then these
must be performed on the output of the substitution normalization process.
The pattern-fitting normalization process receives the output of the sentence-splitting
normalization process (if any), or the output of the substitution normalization process (if
any, and if no sentence-splitting normalization is performed), or the direct input (if no
sentence-splitting or substitution normalization is performed).
Substitution normalizations are heuristics applied to an input that attempt to retain
information in the input that would otherwise be lost during the sentence-splitting or
pattern-fitting normalizations. For example:
- Abbreviations such as "Mr." may be
"spelled out" as "Mister" to avoid sentence-splitting at the period in the
abbreviated form
- Web addresses such as "http://alicebot.org" may be "sounded out" as
"http alicebot dot org" to assist the
AIML author in writing patterns that match Web addresses
- Filename extensions may be separated from their file names to avoid sentence-splitting
(".zip" to " zip")
Sentence-splitting normalizations are heuristics applied to an input that attempt to break
it into "sentences". The notion of "sentence", however,
is ill-defined for many languages, so the heuristics for division into sentences are left up
to the developer.
Commonly, sentence-splitting heuristics use simple rules like "break sentences at
periods", which in turn rely upon substitutions performed in the substitution
normalization phase, such as those which substitute full words for their abbreviations.
[Definition: Pattern-fitting normalizations are normalizations that remove from the
input characters that are notnormal characters.]
Pattern-fitting normalizations on an input must remove all characters that are notnormal characters. For each non-normal character in the input,
Since input normalization can have any of 1, 2 or 3 stages in a given AIML interpreter,
results of normalization may vary widely among different AIML interpreters. Here we show
examples of input normalization in a typical English-language-oriented AIML interpreter that
includes the full complement of normalizations.
|
input
|
substitution normalized form
|
sentence-splitting normalized form
|
pattern-fitting normalized form
|
|
"What time is it?"
|
"What time is it?"
|
"What time is it"
|
"WHAT TIME IS IT"
|
|
"Quickly, go to
http://alicebot.org!"
|
"Quickly, go to http://alicebot dot
org!"
|
"Quickly, go to http://alicebot dot
org"
|
"QUICKLY GO TO HTTP ALICEBOT DOT
ORG"
|
|
":-) That's funny."
|
"That is funny."
|
"That is funny"
|
"THAT IS FUNNY"
|
|
"I don't know. Do you, or will you, have a
robots.txt file?"
|
"I do not know. Do you, or will you, have a
robots dot txt file?"
|
"I do not know"
"Do you, or will you, have a robots dot txt
file"
|
"I DO NOT KNOW"
"DO YOU OR WILL YOU HAVE A ROBOTS DOT TXT
FILE"
|
Each input to the AIML interpreter must pass through theinput normalization process described above, in
which (at the very minimum) the input will be processed according to the description ofpattern-fitting normalizations.
In the case thatsentence-splitting
normalization is used by the AIML interpreter, a single input may be subdivided into
several "sentences". The AIML interpreter must process each sentence of the
input by producing an input path from it.
An input match path has three components, whose order is mandatory, and which correspond
to the three components of aload-time
match path:
-
pattern: the normalized input
-
that: the previous bot output, normalized according to the same rules as ininput normalization.
NB: If there was no previous bot output, or the previous bot output was
unavailable, the value of the |