CHAPTER 16
Outstanding Problems
16.1 From Optimism to Pessimism
The various techniques which we have examined in the first two parts of this book can, at best, be regarded as only partial solutions to the problem of natural language processing. A feeling of optimism which was prevalent among research workers some years ago has given way to disillusion, and in some quarters to frank pessimism. It is ironic that this should occur just when there is unprecedented publicity for all things related to artificial intelligence, and the goal of the 'Fifth Generation Computer' is attracting significant fmancial backing. A large proportion of the effort which is at present going into natural language processing systems is being directed at producing practical implementations of well-established techniques. Those engaged in this work know well enough that the results produced fall short of true natural language processing,
but those funding the work seem satisfied with what they are getting commercially viable products and useful user-friendly systems. They are, perhaps understandably, not too concerned with the philosophical unease which pervades the theoretical side of the work.
Alongside the flourishing exploitation of established techniques, a fundamental rethink of the approach to natural language processing is currently in progress among research workers. It is recognised that in order to progress we must backtrack to a more elementary level, and develop a theory of semantic representation which is not over simplified by being restricted to a limited context or 'micro-world'.
In this book the intention is to examine both aspects, and in this third part we shall look at some of the problems for which no solutions yet exist, and try to develop the outlines of possible solutions.
16.2 Micro-worlds and the Real World
In the early 1970s some excitement was generated by the relative success of systems which deal with a simplified environment ('micro-worlds'). The micrographics example which we described in Part 1 is an example of a micro-world. In it, every object can be associated with a particular identifier, every allowable process is predefined, every concept is concrete. Processes which could be carried out on entities were defined in procedural terms and included the procedures which had to be carried out beforehand. In one well-known system which used a 'blocks world' (a simulated table top covered by simulated wooden blocks of different shapes and colours) the process corresponding to the placement of one block on top of another included the program code to clear the top surface of the receiving block if that was necessary.
At that time many research workers felt that all we needed to do was to analyse and produce systems for an increasing number of specialised micro-worlds, until these gradually accounted for most aspects of the real world. Sadly this optimism has proved unfounded, and it is now realised that micro-worlds of this kind omit many important aspects of natural language understanding. The reader will recall Winograd's discussion of the problem when dealing with a sentence such as 'I want to own the fastest car in the world'. A simple-minded extension of the micro-world approach would find the referent for the phrase 'fastest car in the world', by searching a database of all cars in the world and selecting the one which was the fastest. Humans usually interpret this sentence by assuming that the speaker wishes to create a new car which is faster than the fastest existing car. That is because they know about how people behave, about how records are set, about the technicalities of using the Bonville Salt Flats in Utah for a recordbreaking attempt, and the kudos which comes from building the fastest car as opposed to mere ownership. Note how the idea of human motivation has crept into the discussion, and the vacuous interpretation of the sentence if we leave it out.
In many a micro-world system, the relationship between objects - such as one object 'owning' another - is represented by a simple predicate 'owns(X,Y)',
meaning 'X owns Y'. Such knowledge can be stored within the system, and when asked the question 'Who owns Y?' the system will correctly respond 'X'. This does not constitute evidence that the concept of ownership is understood. Ownership is actually a very complex concept indeed, which many people have difficulty in understanding fully. It involves a social contract between the person owning the object and the rest of humanity. Sometimes that contract is made concrete in terms of a codified law, which is enforced by government, and sometimes it is an informal contract which is respected by all who wish to conform to the social norms of society. Consider, for example, the different kinds of ownership involved in owning a country, owning a child, owning a house (outright), owning a house (under mortgage), owning a house (rented to others), owning a book, owning a chair (at a public meeting - e.g. someone saying 'That's my chair' when someone else tries to sit in it), and owning your own hand. Consider the different processes required to transfer ownership, and the impossibility of doing so if the object in question is 'my hand'. Consider how those who do not own something recognise and respect the ownership of objects by others (in all the different cases listed above). Consider also the sanctions available to punish those who do not respect ownership rights. Can we say that a system (human or computer) understands 'ownership' if that system is not
aware of all these implications and nuances?
This argument does not prevent us from representing ownership formally by means of the predicate own(X, Y). But it should help us to realise that we cannot leave it at that. We must be able to relate that simple representation to other facts in an extraordinarily complex way. The complications usually involve issues of human motivation.
Thus far, the micro-world approach has tended to avoid this issue. Many who develop NLP systems concede the point made above, but continue to use the simplified representation while waving a hand at the problem. Somehow or other the additional information is supposed to be added on later, like some trivial detail. That is not good enough.
Another aspect of the micro-world approach which gives some disquiet is the tendency to define objects in terms of their properties in a rather inflexible way. A man does not cease to be a man if he loses his legs. The identity of an entity is not simply a function of its properties.
16.3 Naive Physics
'Naive physics' is the term used to describe everyday knowledge of the physical behaviour ofthe world. We all learn at an early age that things fall down if not supported, that fluids have no shape of their own, that some objects (like a piece of string) can be used to pull things along but not to push them, that when a moving object collides with another object the second tends to move away. We discover the heft of objects in our hand. We relate the physical appearance of an object to the tactile sensation of texture. The term 'naive physics' distinguishes this type of knowledge from the more formal knowledge of physics which some of us acquire at school or college. Natural language processing depends upon knowledge of naive physics, since this is knowledge which can be assumed common to almost all humans. Such an assumption is implicit in the meanings of words. For example, the verb 'to fall' does not mean 'to reduce in altitude'. When we use the word 'fall' we expect our fellows to understand the spontaneous aspect of falling, and the direction of the movement, and that (eventually, at least) the falling object will make contact with something.
In most existing NLP systems the entities involved have nice regular geometrical shapes. It is particularly difficult to find an adequate way of representing the shape of things which do not have a regular or geometrical form. Even worse is the problem of representing shapeless entities like a quantity of water. Should we distinguish between physical objects which have a size, shape and location, and 'stuff', like wood, water and plastic, which does not have these properties but instead contributes to the properties of objects like 'a tree', 'a lake' and 'a watch'.
Recently some efforts have been made to represent the behaviour ofliquids by means of a kind of finite element analysis, in wQich a body of water is subdivided into regular particulate units and the behaviour of an aggregation of these is computed as the result of the interaction of each element with its fellows and with their surroundings. On conventional computers such systems produce entertaining graphics of water pouring etc, but they require the services of nonconventional parallel machines to produce these results in real-time. Perhaps, in time, we will have the services of such equipment for the development of NLP
systems, but it is unrealistic to think that we will be able to make use of such techniques in the short term. Humans, however, probably do have facilities for visualising the behaviour of fluids which are not unlike this kind of simulation.
The problem of representing fluids and irregular shapes is a severe obstacle for NLP. systems.
16.4 Real Dialogues and Bad Grammar
Scene: In a telephone box. Jack is phoning his girlfriend.
Jack: 'What would you like to do tonight then? Film?'
Jill: 'Mm...'
Jack: 'Good at the Odeon.'
Jill: 'Who is it?'
Jack: 'Humphrey Bogart.'
Jill: 'Oh no!'
Jack: 'Well- how about a walk?'
Jill: 'It's raining!'
Jack: 'Dancing?'
Jill: 'We danced on Tuesday.'
Jack: 'Well what, then?'
Jill: ...
Jack: 'Come on then. Eh?'
Jill: 'My hair is a mess.'
Jack: 'What's the matter?'
The reader will recognise the authenticity of this dialogue, particularly in comparison with most of the dialogues in the current literature concerning NLP systems. The reader will also have no difficulty in following the drift of the conversation, and will already have formed the opinion that Jack is not doing too well.
Consider now how many of the statements in the dialogue are not grammatical sentences. Several of the statements have no verb; two have a dangling conjunction ('then'); several consist of exactly one word ('Film?' and 'Dancing?') and two contain words which have no dictionary entry ('Mm...' and 'Eh?'). Consider also the amount of background knowledge of people and the world which is necessary to see a relevant connection between 'Come on then. Eh?' and 'My hair is a mess'. Without the knowledge that people may grow weary of an activity which they would normally find entertaining, if they do it too frequently, there is no way to understand the connection between 'Dancing?' and 'We danced on Tuesday'.
The moral of this is:
(1) Real people do not always converse in well-formed grammatical sentences.
(2) The connection between statements in a real dialogue is often not explicit,
but relies heavily upon people sharing common background knowledge which carries the hidden connection.
(3) The hidden connection often has a good deal to do with human
psychology, and less to do with the physical world.
(4) An NLP system which relies upon a strict syntactical analysis is doomed to
failure before it starts if an attempt is made to process real dialogues.
16.5 Repetitive Operations
It is not too difficult to see how we could represent a sequence of events extending over a period of time. We discussed this in Chapter 4 when we dealt with timestamps and object histories. There is a problem, however, in dealing with sequences of events which occur repetitively, or which occur several times at unpredictable times. If we say of someone 'He plays tennis' we do not necessarily mean that he plays the game continuously, or even that he is engaged in a game at that precise moment. We usually mean that he is able to play the game, that he has played several games in the past, and that he continues to play from time to time.
How exactly can we represent the notion behind the words 'from time to time'?
In programming we are accustomed to using a looping mechanism to represent repetitive or iterative operations, but that technique clearly will not do in this case. Each game of tennis is a separate event and might be referenced separately in order, say, to provide information about different opponents. There is also the implication that the person concerned has the appropriate equipment, can be expected to play the game again in the future, and might be a suitable person to ask to make up a pair for a doubles match.
There is also the problem of how we can represent a single game, with each rally consisting of one serve and zero to N returns. Again each stroke is a separate event and requires a separate representation. Again looping mechanisms will not do. The number of strokes in a rally is potentially infinite, however, and so we cannot suppose that we can provide an actual representation of each. These are difficult issues which have not been tackled by any NLP to date. We will discuss possible solutions later, in Chapter 22.
16.6 Metaphor
In elementary school lessons on language we learn about 'figures of speech'. The sentence 'He was a /ion in the fight' is described as an example of a 'metaphor', while the sentence 'He was like a /ion in the fight' is described as being a 'simile'. 'Hyperbole' is the deliberate use of exaggeration, as in 'there were millions of people with red rosettes at the football match yesterday'.
Here, by the term 'metaphor', we mean any figurative use of language which does not rely upon the literal meaning of the words for its interpretation. Used in this way the term embraces simile.
In the early days of NLP, metaphor was viewed as a complication we could do without. It was put on one side until the 'main' problems of NLP were solved. The remarkable thing about it, however, is that humans find metaphor so easy to deal with. More recently the realisation has dawned that metaphor is just the most spectacular and visible characteristic of language, among many which give a clue to some of its hidden mechanisms. The use of adjectives which do not quite fit the known properties of the entity in question is another. 'Millions of people' is usually taken to mean 'a very large number of people' not literally 'millions'.
The common characteristic of this use of language is that some of the properties normally associated with the use of certain words, in certain contexts, are inappropriate and are simply ignored. There is a relationship between this aspect of language usage and the point made earlier about a man not ceasing to be a man just because he has lost his legs. The deletion or ignoring of properties normally associated with something, when the occasion demands, is a basic element of the way we think and organise our view of the world, and the way we use language is a direct consequence of the way we think.
Metaphor is also fundamental to the way language develops and changes. When circumstances change, and a novel situation demands a novel word to describe it, we often adapt a word from a more familiar situation on the basis that there is some shared characteristic. When such a use of language is fIrSt introduced it is considered a 'metaphor' and may attract literary acclaim. Later, when the same words are used in the same way by many people, it becomes a 'cliche'. Later still, the new meaning may be formally adopted and fmd its way into established dictionaries. For example, when electric storage cells were first invented the term 'battery' meant a collection oflarge guns which fired together. When applied to the new-fangled electric storage cells it drew upon that aspect of the conventional meaning which was suggestive of a source of great power. Words behave like an amoeba. They can stretch into new shapes and develop strange lobes to embrace new usages. And they can also split in two.
An understanding of metaphor, then, is fundamental to an understanding of language, and an NLP system (in the fullest sense of the term) which cannot explain the mechanism of metaphor has failed.
16.7 Spatial Relationships
It may seem strange to include spatial relationships in a list of outstanding problems, because there have been a number of systems developed which appear to solve this problem quite successfully. It is relatively simple to define a number of predicates such as above(x, J) (which is interpreted as meaning that the object 'X' is above the object 'Y') and onJop(x,}J, ifLContact(x,}J, below(x, J) and behind(x, J) which all have similar meanings. These can be related by rules stated in the form:
above(X,Y) and in_contact(X,Y) implies ontop(X,Y)
or any similar logical notation. Many rules of this kind could be constructed so that the implications of any expression of spatial relationship can be reexpressed in terms of other spatial relationships. Thus the implications of a statement about spatial relationships can be inferred. This appears to be a neat solution to the problem of representing spatial relationships until we dig deeper into the nature of the problem. Consider the set of predicates and rules shown below:
right_of(X,Y) implies left_of(Y,X)
right_of(X,Y) and right_of(Y,Z) implies right_of(X,Z)
If we used these to represent the positions of persons sitting at a dinner table we would find some anomalies if the table was round. At such a table, if we keep placing people 'to the right' we will eventually place someone 'to the left' of the first person who sat down. This appears to infringe the rules stated above. The rules above simply do not deal properly with the notion of 'to the right of' when it is applied in the context of a circular table. Does this mean that we must create a separate logic suitable for each shape of table?
There is also the problem of correctly interpreting statements such as 'on the side of the picture'. Does this mean within the picture itself but not in the centre, or does it mean adhering to the outside of the picture frame? Does 'behind X' mean on the far side of X so that it is occluded by X, or does it mean to the rear of X?
What is obvious is that the simple-minded approach outlined above is inadequate.
(