CHAPTER I
An Initial Problem
1.1 The Micro-graphics 'World'
A small personal microcomputer is equipped with a simple graphics facility and an appropriately simple graphics language. We wish to construct a user-friendly interface which will accept a user's commands in natural English and translate these commands into the equivalent statements in the graphics language. The natural language allowed will be restricted, so that the interface will not be too difficult to construct, but it will be sufficient to illustrate a number of points and lead us towards more complex examples.

The problem is therefore characterised as follows. The input text is a very limited subset of English, constrained to conform to a strict and limited grammar and the 'meaning' of the input text statements is represented in the form of statements in an internal language (the internal text).The internal text is the graphics language supplied with the machine. There is therefore no complication about what we mean by the 'meaning' of an input statement. If a statement cannot be translated into a statement in the graphics language then it must be considered to have no meaning at all. That is why we refer to the system as a 'world'. It has a definite boundary beyond which it cannot see and beyond which (so far as the system is concerned) nothing else exists. The problem is illustrative of a number of interesting and successful natural language processing systems which have been constructed for limited worlds such as this.
The problem for our system is to convert English into graphics language. But recognising that the text which is input cannot be unrestricted English, and that the output need not only be graphics commands, we can generalise the requirement to that of achieving the conversion: input text -> internal text
1.2 The Internal Text
Points on the screen (or pixels) are defined by X,Y co-ordinates (1000 X 1000). The user may provide names for specific points thus:
point fred 123 456
which attaches the user-defined name fred to the point 123,456. Such a statement is always preceded by the keyword point.
Lines may be defined by the user thus:
line fred tom
which defines a straight line drawn from the point with name fred to, the point with name tom. Fred and tom must be predefined. Alternatively the user may refer directly to co-ordinates thus:
line 123 456 678 901
which defines a line drawn from the point 123, 456 to the point 678, 901. A typical program in this limited graphics language would therefore consist of a series of point definitions followed by a series of line definitions. For example:
point p 123 456 .
point q 678 901
point r 543 654
line p q
line q r
liner p
which would draw a triangle. This, or any similar program, is an example of the internal text.
1.3 The Input Text
Suppose now that a user typed the following: '"
'Let p be the point 234 456 and q be the point 678 901. Draw a line fromp to q
and then draw a line from q to r which is the point 513 654. Join p to r.'
.
Clearly there are very many ways in which a user could express the same meaning. For the purposes of this exercise we wish to limit the variety of forms which the input text could take, and we deal with this by defining:
(a) an allowed vocabulary oflexical items (or words}and
(b) a grammar, which is a set of rules indicating the allowed patterns in which the words may appear.
The reader should beware of using the word 'word' carelessly in the presence of linguists. We need to distinguish between the 'lexical item' (or string of characters) and the 'sense' of a word (or its meaning). Even the term 'meaning' is fraught with difficulties and misunderstandings because linguists like to distinguish between 'intentions' and 'implications' and 'sense'. Consider for example:
'John means well'
'You did not bring a pen so that means that I will have to lend you mine.'
"'To cry" means "to produce tears".'
We do not intend to be pedantic about this and when we use the term 'word' or the term 'meaning' it will usually be obvious from the context which interpretation is required. If not, we shall qualify the term.
It will be seen that the internal text consists of two kinds of statement: definitions, or assignment statements, which associate names or labels with actual co-ordinate points; and action statements which cause some graphic images to appear on the screen, usually drawing lines between the points which have been defined. We shall consider these two kinds of statement separately.
1.4 Assignment Statement Analysis
Consider first the statement which the user might type in order to define point p. We shall call this an 'assignment statement' because it assigns coordinates to a label (p).The assignment statement might take the form:
let p be x y
or let x y be p
or let the point p have coordinates x y
[NOTE (added in 2004) Angle brackets which are required by Backus-Naur notation are also used to denote control statements in HTML. In what follows, I have therefore replaced the angle brackets of BNF with curly brackets - or turned the text into an image of the text.]
where p is a user-defined name or label for a point and x and y are numeric values. To cover
such possibilities we classify words and phrases so that our description of the allowed statement structure can be more general. Let us use the classifiction determiner to stand for the set of words a, an, the. Therefore when we want to save space we simply write {determiner> and the reader should understand that the expression (including the angle brackets) should be replaced by any of the words in the set. In any such definition there is a danger of ambiguity between the use of words (such as 'a', 'an' and 'the') within the definition and the use of words (such as 'means', 'or', 'the set') as a part of the definition (the 'meta-words'). To avoid such confusion we shall adopt BackusNaur Form notation (BNF), so that the definition of the classification 'determiner' becomes:
determiner:= a/an/the
The symbol:= is taken to mean 'is defined as' and the slash '/' means 'or'. Any word which appears as itself on the right-hand side of such an expression is taken to be the literal version of itself. On the left-hand side of each expression there should be only one term, the entity being defined. In our grammar defined below the expression:

means that the expression 'assignment' is defined as consisting of either the word 'let' followed by a 'pLdefn' expression, or it is just a 'pl_defn' on its own.
A 'pl_defn' (or 'point definition') expression is defined thus:

That is, it has two possible patterns:

The verb is in some cases prefixed by (let). This indicates that the verb which follows (the infinitive form) is acceptable only if the whole statement is prefixed by the word 'let'. In the other cases the verb is acceptable only if the word 'let' does not appear. It is still necessary to define ptname, verb and coorddefn. This definition will fail if the user employs names such as 'a' as the name of a point because 'a' is also a word in the English language. We shall return to this point later. In the meantime we shall assume that the user avoids these ambiguities.
Without further ado we can write down the remainder of the complete grammar of the assignment expression in Backus-Naur form:

It is not necessary in BNF to number or label the clauses. We have done so here (a,b,c,...j) to make it easier to refer to individual clauses.
1.5 The Function 'ptname'
We now have a grammar for our input text assignment statement and the next requirement is to analyse this in terms which are easily converted into a program. A good strategy is to subdivide the problem into smaller problems, and so we shall consider first the ptname and develop an analysis and a program for that. (We shall assume the existence of a function isname(w) which takes as its argument a single 'word' (in POP I I terminology, or a textual atom in a list) and yields true or false depending upon whether or not it is a user-defined name.
We assume that the input text is presented to this system in the form of a list of atoms (words and numbers) e.g.:
[the point fred has coordinates 123, 456]
The program will scan the input text from left to right and test for the existence of a ptname at the start of the list. If a ptname is found the program (a POP11 function) must return three results:
(1) true or false
(2) The internal text sub-list, e.g. [point Ired] (i.e. a sub-part of the graphics language statement) (or nil if false)
(3) The remainder of the input text.
The performance of the required function is as shown below.
input-text = [the point fred has co-ordinates 123, 456]
result = true, [point fred], [has co-ordinates 123, 456]
input-text = [to be or not to be]
result = false, nil, [to be or not to be]
In this way the required function can test the input text. If it is successful it will extract the part of the input text it has processed, prepare a sub-part of the final internal text that is appropriate, and hand on the remainder of the input text for further processing. If it is unsuccessful in finding a ptname phrase it will report failure, deliver up a NIL list instead of the sub-part of the output text, and hand back the complete input text for an alternative form of processing.
If the function is successful and the sub-list [point fred] has been formed, the processing of the remainder of the input text may construct the second half of the internal text list [123 456]. This will be concatenated with the first part giving the correct internal text form: [point fred 123 456]
Now consider the finite state diagram (FSD) shown below.

This illustrates a system with three states (st, S2 and S3). Each represents a 'state of expectancy' of the system. the system is, of course, our required function, which is processing the input text for a ptname phrase. Initially it is 'expecting' the determiner the. If it gets it it switches to state S2 and expects the classifier point. If this is found it switches to state S3, in which it expects a userdefined name. If this is found it exits, leaving the results indicated above.
If, on the other hand, the input text omits the determiner the, or both the determiner and the classifier point, then the system still switches to states S2 and S3 but takes different action as it does so. If the expected item is encountered, the system removes that item from the input text and goes on to test the next item in the list. If the expected item is not encountered, the input text is left unchanged and the system goes on to the next state to make a different test on the same first item as before.
Switching to a new state without making any alteration to the input text or taking any other action is shown in the FSD by the un-named arcs. These are often called 'jumps', and the introduction of jumps augments the FSD to become a new type of system.
It is very easy to turn a finite state diagram of this kind into a short program or function.

The reader will recognise that there is a small amount of redundancy in this program. This is deliberate in the interests of clarity. GOTOs have been used in spite of the tenets of structured programming to emphasise the parallelism with the finite state diagram. In this particular program definition, all the GOTO statements can be omitted because the normal linear sequencing of the code will take control to the correct point. This is somewhat fortuitous, however, and it would not in general be possible to omit the GOTOs without a considerable redesign of the function. We shall leave the consideration of a better program structure until later (Chapter 8) and concentrate for the time being on the structure of the grammar.
1.6 Word Classes and Determiners
We have 'cheated' slightly in writing this program in that we have tested directly for the words' the' rather than for the classes of words. The program would therefore succeed if the user typed 'let the pointfred have coord...' but would fail if 'let a point fred have coord. . .' was typed. To correct this and allow the program to accept any determiner, we construct a list of all determiners, thus:
[the a] -> dets;
This in effect places the words a and the in the same grammatical category 'determiners'. Now instead of writing
'if hd(intext)= "the" then..,',
we can write
'if ismemb(hd(intext), dets) then. . .'
which is equivalent to saying' if the head of intext is a member of the set dets then'. Although this is valid and will suffice for our simple problem, it ignores the fact that the meanings of these determiners ('the' and 'a') are subtly different. Later we shall need to take cognizance of this difference.
The whole idea of placing words in categories is fundamental to the notion of syntax. We cannot define a language by writing down every possible sentence, because the number of possible sentences is infinite. The fact that the number of sentences is infinite can be seen from a simple example. Consider the sentence: John is very tired. The sentence is grammatical and remains grammatical even if we repeat the word 'very' several times: John is very very very tired. In fact there is no limit to the number of 'very's which we can insert without the sentence becoming ungrammatical. Therefore this simple example is actually an infinite set all on its own. Consider also: John is very tired (after his run). We can add to this a qualifying clause - (which was arranged by the sports committee), (which was formed in 1984), (which is the year mentioned in George Orwell's novel) and so on and on.
Even if we ignore the possibility of infinitely long sentences and place an arbitrary limit on the length of a sentence (say 100 words), we are still confronted by an astronomically large set of possible sentences. To define the allowed structures we must therefore classify and generalise so that one pattern can represent a whole class of sentences.
The traditional grammatical categories are in fact curiously cyclic in their definition. When children begin to learn grammar they are usually taught that verbs are 'doing' words and nouns are 'naming words for things or sets of things'. These notions serve to establish the categories in our minds but there are obvious exceptions to these rules, for example 'The shouting was very loud'. Here the word 'shouting' is being used as a noun, and yet it describes an action and the word is usually described as a verb. (The term 'gerund' is sometimes used to describe a verb used in this way as a noun.) Consider also the sentence 'His embarrassment was acute'. What kind of thing is 'embarrassment'? It is a state of mind, a feeling with all kinds of social implications. The initial concrete notion of a 'thing' needs to be modified. What kind of doing is 'was'? Again it indicates a state of affairs. The naive notions given to a child are clearly oversimplifications.
It seems that having established a category, we note that it usually occurs in sentences in a particular positional relationship to other word categories. We label that position as the location associated with the category. The category has now become a label for a position within a pattern. Any word which is now found in that position acquires the label. But the position would not have been recognisable without the notion of the word categories for the other words in the pattern. Thus category defines position and position defines category. Fortunately there are a few words which are not ambiguous in this respect and these can serve as markers for the remainder of the sentence. The determiners ('a', 'an' and 'the') are such words.
We can write down the list of all words in a language (because the list is not infinite) and label them as nouns, adjectives, verbs etc. Even so we have problems about ambiguous words which can have more than one category. This is an example of the context-dependent nature of natural language. In constructing a formal grammar for our input text we are to some extent destroying this property, and for that reason we cannot consider the input text as a genuinely 'natural language' .
1.7 The Coordinate Definition Phrase
Now we consider the development of a program for processing coordinate definition phrases {coorddefn>. Again we represent this first as a finite state diagram.

Note the loop arc in the diagram which corresponds to the possible presence of a 'comma'. In this form, without some form of counting mechanism, the FSD would pass as correct an infinite number of commas. A counting mechanism is easily introduced by allowing the traversal of this arc to check and update a register storing the number of times this arc has been traversed. Alternatively a new state can be introduced to which the FS-automaton would advance from S4 and from which state the occurrence of another comma would be the selector for a 'failure' arc.
The equivalent POP11 function is as follows:

1.8 Verb Phrase Analysis
Consider now the verb phrase analysis. We are prepared to allow two forms of verb:
(1) let... be ...
let... be called ...
let... have ...
(2) ... is .. .
... is called. ..
... has...
We shall call (1) the 'infinitive' form and (2) the 'present tense' form. In the case of the infinitive form the verb is split up. The program analysing a phrase in the infinitive form will encounter let and must 'remember' to expect the infinitive form be or have at a later time. To cope with this we allow 'side-effects' to be created by the processing function as it traces an arc in the finite state diagram. That is, it updates a global register. We denote this side-effect by (let). The FSD for the verb analysis is shown below:

From S1 the system goes straight to S2 or S5 depending upon whether the (let) side-effect is true. A global switch can be used to indicate (let) say letsw=1. We leave to the reader the exercise of turning this FSD into a POP11 program.
1.9 The Full Assignment Statement
We are now in a position to consider the full assignment statement

The FSD is shown below.

The POP11 program for this is as follows:

At Sl we detect the let and set the global switch letsw. The switch can be
detected by the verb function.
At S2 the program tries a possible analysis with ptname(intext) and if this fails it tries the alternative analysis with coorddefn(intext).Only if this fails does the function exit with a failed result.
In this section we have been concentrating upon the problem of backing out of an unsuccessful attempt to match ptname or coorddefn. At the same time we have done some violence to the structure of the FSD by introducing new types of arc. These changes are the subject of the next section.
1.10 Transition Networks
We introduced the idea of the finite state network or diagram (FSD) in section 1.5 without much formality. The idea is simple but powerful, and the reader may already be familiar with the use of FSDs from the study of formal languages or [mite state automata. For those unfamiliar with the basic ideas and the capabilities of the various types of transition network we will provide some clarification here.
Each node in a transition network represents a 'state of expectancy' of a system, and each arc leading from a node represents the transition of the system from one state to another after the occurrence of an expected event. A loop which returns to the node from which it started represents an event which does not change the state of the system. Such a loop can therefore be used to represent the (possible) multiple occurrence of some event (e.g. the 'comma' arc in Fig 1.3).

Finite State Machines (FSM).
When we use an FSM diagram to represent a grammar we can think of each arc as being the occurrence of a particular type of word in the input text as it is being processed. We can denote each class of word by a single character (a,b,c,d,...) and place in brackets each category which may be repeated an indefinite number of times. For example if 'n' denotes a noun, 'a' denotes an adjective and 'd' denotes a determiner, we might write down the pattern corresponding to a noun phrase as d(a)n, to indicate that it consists of one determiner, possibly several adjectives and a noun. With this notation, and not bothering about what word categories the letters stand for, we can illustrate the various patterns which an FSD can process correctly.
The limitations of FSM.
With the formalism of the FSM we can represent simple sequences, optional items, repeated items and repeated sequences. We have difficulty, however, in representing a pattern in which an optional item (if it does occur) may occur just once (or a fixed number of times), and we cannot represent palindromic sequences such as abcd-dcba, where the sequence of the second half is determined by the sequence (reversed) of the first half. Palindromes are not of any particular importance in natural language but they do occur frequently in formal languages such as programming languages (e.g. brackets are always palindromic because the close brackets ')' must match the open-brackets '(').
Recursive Transition Networks (RTN).
The mechanism which makes recognition of a palindrome possible is very powerful and has other uses. A palindrome P which is made up of the characters a,b,c, has the structure aPa, bPb or cPc, where P stands for another palindrome with exactly the same structure (hence recursive). At some stage the sequence must stop, and so we need to provide for any of the palindromes to have a value which does not involve any other palindromes within it:
P :- a / b / c / aa / bb / cc / aPa / bPb / cPc
This can be represented by a modified FSM,

Here the middle arc is labelled not by a simple word category but by a complete network label. In this case the middle arc is labelled by the name of the network we are actually defining. Such a structure is called a 'Recursive Transition Network' (RTN).
It is conventional also to include within the definition of an RTN the possibility of a null or 'jump' arc. A jump arc is one which does not process any words in the input text but merely advances the system to a new state so that the input text can be subjected to a new batch of tests. The jump arc can be regarded as an 'else' condition, an arc which is traversed after all other possibilities have been tried and found wanting.
The limitations of FSMs and RTNs.
The simple FSM has no 'memory'. If the system is currently at state S it has no way of 'knowing' that it has been in that state before, nor does it know by what pathway it reached its current state. The RTN does have a memory, but only the (recursive) memory of arcs started but not completed (which are then completed in a 'last started, first finished' sequence). An RTN cannot therefore be used to represent the circumstances in which the tracing of a particular arc at an early stage can influence the choice of arc at a later stage (except for recursion). Neither can it be used to represent a pattern containing an element which is repeated a fixed number of times (greater than one) without having an explicit arc for each occurrence.
Augmented Transition Networks (ATN).
A network can be provided with a memory in the form of registers and counters which are updated while an arc is being traversed. We used these mechanisms to count the number of commas, and to make known the presence of the word LET during the verb-phrases processing. The updating of a global register leaves behind evidence of the pathway followed, and this evidence can be accessed during the traversal of subsequent arcs (and used as a further condition). In this way the traversal of one arc can influence the traversal of later arcs. To restrict the number of times the system may traverse a given arc the system can increment a counter which can then be used to control or prevent subsequent traversals of that arc. Such features are generally known as side effects and the addition of side effects turns an RTN into an Augmented Transition Network. It has been shown that an RTN has the power of a context-free grammar, and an ATN has the power of a Universal Turing Machine.
ATNs (or their equivalent) are required for natural language processing because natural languages are recursive, and because of the agreements which hold across quite widely separated parts of a sentence structure.
Consider, for example, the two sentences:
Sl: 'John bought the book'
S2: 'Mary was glad that John bought the book'.
Sentence S2 contains S1 within it. It is obvious that embedding of this kind can continue indefinitely, and so it is necessary, in the definition of the syntactical pattern of a sentence, to include the possibility of it containing a sentence. If we denote a sentence by 'S', then one possible pattern for S is given by
S := nvT
where n is a proper noun
v is a verb
T is a 'that' clause. and T:= 'that' followed by S.
S is therefore embedded recursively in the definition of S.
Consider also the sentence:
'I gave a bat and a ball * to Tom and Bill respectively'.
Here the two items bat and ball must agree exactly in number with the two
recipients Tom and Bill and the two parts of the sentence which must agree could be widely separated by qualifying phrases inserted at the point marked '*'.
1.11 Action (or Line) Definitions
The statements of the graphics language are of two types: assignments (e.g. Let fred be the point 123,345) and action definitions (e.g. Draw the linefredjoe). We have dealt with the former and given POPII programs which handle these. Rather than repeat the process for the action definitions we leave that problem to the reader, and show instead how the problem of action definitions might be handled in Prolog. The Prolog programs hide their mechanism to some extent, but it will be found that the underlying mechanism is very similar to that employed in the POPII programs.
First we note the variety of statement which might be accepted as valid input text.
draw p q
draw a line from p to q join p to q
join p q
draw a line to q from p
where p and q have been defined. In each case the output text is to be
linepq
We note that the input text can be analysed into the following sub-statements:
a source
from p
from the point p p
the point p
That is, a ptname, optionally preceded by from.
a destination
to q
to the point q q
the point q
That is, a ptname optionally preceded by to.
a line segment
BNF-1.5
an object

where {obj> is a member of the set [line, . . .j We may add other objects to this list later.
an act

an action definition

These definitions translate directly into Prolog. Before giving the program we
wish first to explain the structure of the Prolog program using the definition of a determiner as an example.
det([WlR] .R):-ismemb(W,[a,the]).
The predicate 'det' takes two arguments - [W|R] and R. The first of these represents the input text and the second represents the remainder of the text left behind for further processing. The notion is exactly the same as used before in the POPll functions. Each sub-part of the program will nibble off a part of the input text and spit out the remainder for further processing. If the part nibbled off is not as expected then the sub-part (predicate) will fail and the Prolog will automatically backtrack to try another form of analysis. The program is therefore simpler than the equivalent POPll program because the backtracking and the detection of failure is automatic. For this reason the program is also harder to understand.
The test of success or failure is shown on the right-hand side of the expression above.
ismemb(W,[a,the]).
this succeeds if W, the first element in the input text [W|R]is a member" of the set [a,the].
R is the tail of the input text [W|R]. Put colloquially, then, if the first element in the input text is a member of the set [a,the] the predicate 'det' bites it off and returns the remainder of the input text for further processing.
Now we provide the remainder of the Prolog program:

The reader should now try writing the line definition processor in Prolog.