CHAPTER 23
Representing Objects
23.1 Physical Characteristics
In Chapter 20 we introduced a technique for the representation of an entity based on 'states'.
We noted the need for a state which represented the identity of an entity which was independent of its properties.
From that starting point our problem is to find good ways of representing the various properties which an entity might have.
We based the representation of the properties on the perceptions which a human might have of them.
Each state was provided with a unique identifier and a time-stamp. The structure which results is illustrated in Figure 23.1.

An entity has several very obvious properties. For example it has a location in space, it may consist of several sub-parts,
and it may have certain functional roles to play. All of these present us with particular problems for representation, and we
shall consider them in turn.
23.2 Spatial Relationships
We are concerned here with the relationships between objects such as being 'near' to one
another (or to the speaker), one object being 'above' another, or 'inside' another, or 'beyond' etc.
We discussed in Chapter 16 the problems which can arise when we try to represent such relationships, and in
Chapter 20 we introduced a method of representing 'states' based on perceptions. Specifically we introduced the
notion of a 'framework' or frame of reference for a perception, which identified the sensory channel involved, the aspect
of perception, the model (or coordinate system) being used, and the axis concerned.
The essence of the approach which we suggest is that a relationship such as one object being 'behind' another
can be represented with reference to several frameworks at the same time. Thus 'behind' (in the sense of being
'behind a tree') is perceived visually by noting that one object is occluded from view by the other. In another framework,
which uses a model with one of the objects concerned as origin, we have the notion of 'behind' meaning 'to the rear of'.
In another framework, which uses the speaker as origin, 'behind' would be represented as meaning 'to the rear of the
speaker' (and therefore out of sight). 'Distance' is perceived in terms of frameworks associated with texture gradients,
blueness, reduced apparent size, reduced apparent sound, the possibility/impossibility of touching an object, its scent,
and so on. A physical object would be endowed with a shape, size and location, whereas a fluid would have no shape,
invisible gases would have no visible location, and so on. A heavy object would be associated with a significant 'heft' in the hand.
A hard object would be associated with an unyielding 'feel'. These ideas could be implemented using predicates or networks of
data structures, or in many other ways. The important point is that the physical properties are represented in terms of perceptual
primitives rather than being represented in terms of arbitrary predicates.
Another aspect of the suggested approach is that each physical property should be represented in terms of all
the associated perceptions at the same time.
This will make serious demands upon storage space and is likely to produce very cumbersome representations,
but it can be argued that if distance was represented in terms of the perceptions listed above (all at the same time),
then it would be possible for a natural language system to detect (in a suitable context) that the sentences
'The aeroplane flew away' and 'The aeroplane shrank and disappeared' could be construed meaning the same thing.
When contextual information is necessary to make a choice between alternative interpretations, all interpretations should
be created and sustained until further information makes it possible to choose. In such a system 'ambiguity' comes to be
regarded simply as a lack of information.
In Chapter 16 we gave an example of the kind of difficulty which can arise if spatial relationships are represented in a
simple-minded way: the relationship 'to-the-right-of' when people are sitting at a circular table. We have to ask ourselves
what is the meaning of 'circular table'. It is a table which has a shape such that people sitting progressively to the right
end up by being on the left of the fIrst person to sit down. 'To-the-right-of' means 'to the right with reference to some line',
and the line in this case is the edge of the table (implied by 'sitting at the table'). The ideal way to deal with a problem like
this is to develop some form of imagery representing the scene, and to read the positions of people directly from the image.
We do not at present have the kind of computer power which makes this practical, but we might get some way to a solution
by representing' tothe-right-oj in terms of the direction in which a person must turn their head to bring the next person into view.
23.3 Constituent Parts: the Anatomy of Objects
Objects usually have recognisable parts from which they are constructed. Knowledge about the anatomy of an object
is necessary to understand many statements about it. For example, the statement 'She whispered in his ear' immediately
informs the human reader that:
(a) she is close to him
(b) she is speaking quitely
(c) few people will hear what she says (d) 'he' will hear what she says
This knowledge may be necessary for an adequate understanding of a narrative, and such knowledge is assumed by any storyteller.
The necessary knowledge must include the knowledge that an 'ear' is part of the human anatomy, and is the part responsible for enabling hearing.
Semantic networks have frequently been used to indicate the structural relationships between objects.
We could construct a tree structure such as that illustrated below:
human body :- (head, trunk, arms, legs)
head :- (face, hair, neck)
trunk :- (chest, back, waist, pelvis)
arms :- (left arm, right arm)
arm :- (hand, forearm, elbow)
legs :- (left leg, right leg)
We could go on to break down 'hand' into fingers and thumbs. The face would break down into eyes, ears, nose, mouth and so on.
Structural information would then be needed to indicate the relative positions and roles of these constituent parts - support(legs,trunk),
attached(arms,trunk), attached(head,trunk), support(trunk,head), and so on. The approach seems plausible at first sight,
but it is fraught with difficulties.
The problem is - where do we stop? Every time an entity is represented and that entity is human, do we really want to create a
structure like this - expanded in detail right down to fingernails, cuticle, nostril hairs, pores, blackheads, and many other parts
which we will leave the reader to imagine? This seems an excessive amount of representational junk to carry around on the
remote chance that it may be needed. Even if we do so there is the chance that the understanding
of a statement will require knowledge of internal anatomy. We often reter to 'stomach', 'appendix', 'kidney', 'throat', 'lungs', 'nerves' etc.
If carried to its logical conclusion this approach will require a representation to rival a medical textbook on anatomy.
To avoid being overwhelmed by the sheer volume of information which appears to be required, a more economical
method of representation is needed. A possible solution would represent an object by only the top level of the tree structure.
For example, we might represent a human as we have done above, and resist the attempt to break the parts down any further.
We might add the component 'internal-anatomy'. The structure which would result is illustrated in Figure 23.2.

The advantage of the additional record structure 'consists of' is that this state can be given a time-stamp of its own.
We could therefore represent the situation which would arise if someone lost an arm. The arm would not cease to exist,
but it would cease to be a constituent part of the person-entity. We could then have two 'consists of' states with time-stamps
indicating the time at which the arm loss took place. Each component such as 'hand', 'arm' etc., would then be regarded as a
'macro', or the label for a database which could be expanded when required.
This suggestion is not a complete solution to the problem, however. There is
still a difficulty in deciding when the expansion should take place. If we were dealing with the sentence 'John broke his finger'
it would be necessary to note that 'finger' was an element of human anatomy and a part of 'hand'. The word 'hand' would have
as part of its definition (or one of its many possible definitions) the information that it was often used (by humans) for picking things up.
The system would then be able to deduce that John's ability to pick things up had been impaired (from the definition of 'broke').
This arrangement works well, and an appropriate expansion can be triggered if the two entities concerned are sufficiently close to
one another that the overlap is detectable at top level. If, ho ever, the overlap is not obvious at top level, we have the problem of deciding
whether or not it is going to be worthwhile expanding. Consider the sentence 'John broke his barometer'. Here there is no overlap which
will be detectable between the short tree structures tagged on to the definition of each entity. Both are physical objects, but there the
relationship stops. The real point of connection lies in the 'role' which a barometer plays (which we will discuss in the next section).
In the tree we described above, each node represents a constituent part of the entity represented by its superior node. It is also possible
to construct a tree in which each node is a 'specialisation' (or specialised example of) its superior node, and at the same time each node
is a 'generalisation' of its subordinate nodes. We might call it a 'consists-of-and-is-part-of' tree. Many efforts to produce a system for the
handling of semantic information have been based upon tree structures of this kind. Complications abound, however. A 'leg' is a part of a
person, and it is also part of a table, a chair and so on. It appears that we need several different types of 'leg' at different points in our tree
structure, and the same can be true of almost every other concept.
23.4 The Classification of Objects
In addition to a 'consists-of-and-is-part-of' tree, we can also construct an 'is-a' relationship network. We might for example classify
a 'spaniel' as a 'dog' and a 'dog' as an 'animal'.
In such a tree we say that each level is a 'specialisation' of its superior node, and each is a 'gene,alisation' of its subordinate nodes.
Furthermore, it is often not necessary to repeat all of the properties which an entity may have at each node. If all animals are physical
objects and therefore have shape, size, location etc., it can be assumed that all subordinate nodes (or specialisations) ofthe node
corresponding to 'physical object' will also have these properties. This idea is known as inheritance and it is an important property
of semantic nets of this kind. In the previous section we noted the need for several different kinds of 'leg'. Each of these could be
considered a specialisation of a general concept 'leg' which is a support for something (unspecified). Each type of 'leg' specialises the
concept by specifying the thing supported. Obviously our two types of network must intersect.
But cutting across such a structure are other possible classification structures.
We might classify objects into 'edible' and 'inedible' objects, or as 'solids', 'liquids' and 'gases'.
There is indeed, as we noted in section 17.6, some evidence that humans do in fact classify objects
according to properties such as 'edible' and'inedible'.
For all types of classification structure we have the same dominant problem.
We have potentially a very large data structure to store, almost all of which will be redundant in any given
set of circumstances, and some way must be found to minimise the amount of information brought to bear
on a given problem. At the same time enough information must be provided to enable the decision to be made
about whether or not it would be worthwhile to expand the structure to include more information. The top level
of information tagged to each entity explicitly should be regarded as 'heuristic signposts', which indicate whether
or not a search through the semantic structure is likely to be fruitful.
This remains a very significant problem.
23.5 The Functional Roles of Entities
What is the meaning of the phrase 'tennis ball'? One type of defInition would stress its shape, its size, its bounciness,
its being made of rubber with a felted surface, its hollowness, the curiously curved pattern of lines on its surface, and so on.
On learning this defInition a person would presumably be able to recognise a tennis ball. We could give a tennis ball to the same
person and they would be able to store a much more accurate set of information about its characteristics based on its appearance,
weight and feel. AU these we might be
able to represent by means of the perception-based representation. But would a person whose knowledge was confmed to these
facts about its physical . characteristics really know what a tennis ball is?
To complete their knowledge we would have to tell them about (or better still let them see) a game of tennis.
That is, we would need to inform them about the role for which a tennis ball is intended.
A tennis ball can, of course, be used in a variety of ways which have nothing to do with tennis,
but an understanding of its intended role is part of knowing what a tennis ball is.
It is therefore necessary to include, within the representation of an object, a representation of a scenario which describes its role.
This scenario must include the representation of a number of other entities. To distinguish these from the one which is the object
being represented, it is necessary to mark or otherwise indicate the 'salient' element of the scenario representation.
The phrase 'tennis ball' is quite a hard one to represent, and so we will leave it until the next section where we shall deal
with the representation of repetitive events. Instead we will consider the representation of the word 'fertiliser'. This is usually a
brown, earth-like substance which would not easily be distinguished from many other substances if its role was not known.
The representation might be something like this:'
id=
owner =
{S1 (a plant)}
{S2 (soil)}
* {S3 (fertiliser)}
{S4 (agent)}
{S5 S3 in S2 }
{S6 S4->S5 }
{S7S1 in S4 }
{S8 size of S1 increase}
{S9 S5->S8 }
That is, there exists a plant, some soil, some fertiliser and an agent. The plant is in the soil.
The agent places the fertiliser in the soil, and this causes the plant to grow. .
The state marked ,*, is the entity being defined (the salient).
Of course we have glossed over a lot of detail in this illustration. Each state
would contain many elements, including time-stamps which would be in chronological sequence.
The plant and the soil would have their physical characteristics. We might indicate that the plant would grow in any case,
but that the fertiliser would make it grow faster and bigger. But these are details which do not affect the point being made.
Note that the same scenario could be used to define the verb 'to fertilise', except that in that case the state S6 would be
identified as the salient (i.e. the act of causing the fertiliser to be in the soil). Note also that 'fertiliser' is a noun,
while 'to fertilise' is a verb, and recall the comments made (in section 11.5) about the way case grammar mistakenly
places too much emphasis on the case structure of a verb and ignores the information content which a noun carries. The causal connectivity of a scenario representation is crucial to an understanding of roles.
23.6 Representing Stuff
In Chapter 16 we gave an outline of some of the major stumbling blocks in natural language research.
One of them was the representation of substances (or 'stuff').
Most current NL systems make the convenient assumption that the things which populate our world are solid,
well-formed objects. Locations can be assigned to them, they have shape and size and a unique identity.
Computers handle data in the form of discrete chunks or structures, and it is therefore natural that we should
attempt to model things in terms of these discrete structural units. 'Water', however, is not a discrete object;
it is a material or substance of which other things may be made. The same is true for 'wood', 'steel', 'air', etc.
Trees are made of wood, so are tables and chairs. The atmosphere is made of air, and the Atlantic Ocean is
made of water, as is the River Nile.
It is first necessary to note that it is impossible to think about a substance of any kind without thinking of
something made of that substance. It may not be an identifiable object with a well-known shape and a name,
but any mass of substance is an entity of a kind. Furthermore, the properties possessed by a substance are
the properties which it bestows on the entities which are made of it.
We have the means to represent such entities. We can represent their appearance, their feel, their shape
(or lack of it), their colour, their rigidity and so on. Fluids bestow on any entity made of them a shapelessness.
Entities made of gas have no fixed size (or volume). Fluid objects try to escape, and are retained by means of a
container. Gaseous objects expand and sometimes float upwards.
We can begin our representation of a stuff X by inventing an entity which is made of X. It need not have a name.
It will normally be represented by an anonymous identifier.
{Sl entity X }
* {S2 = S3,S4,S5,S6,...}
{S3... of X}
{S4... of X }
{S5 ... of X }
{S6... of X }
etc.
where S3, S4, S5, S6 represent properties of X
Structural state S2 would be identified as the salient in this representation (the substance itself).
That is, the substance is that which provides the prpperties of X. By representing the properties of this entity X
we are thereby representing the properties of the substance.
If we have available the computational means to process large arrays of similar elements efficiently,
we could represent the object as a conjunction of a multitude of discrete elements (the state S2).
Each element would have certain properties which would mimic the behaviour of the substance.
Elements of water would always fall if they were not supported by either a solid object or another element of water.
Thus water would always adopt the shape of its container. In a solid substance the individual elements would
adhere to one another and preserve a basic shape.
In addition to these characteristics we could represent the colour, texture, etc.
Within the same representation we might describe the role of the substance. Air is breathed, water is drunk,
wood is burned. These roles would be represented as scenarios in which things are consumed by fire, for example. We
do not want a simple predicate 'cnLbum(wood)'. We should also represent where some of these substances come from.
Wood comes from trees, for example. We need, then, a scenario describing the growing of trees.
All this information is an essential part of our understanding of a statement such as 'I need air!' But the complexity
of the information required, and the volume of the corresponding representational structures, are such that some means
is required of ensuring that the full expansion of the structure is only carried out when necessary.