CHAPTER 23

Representing Objects

23.1 Physical Characteristics

In Chapter 20 we introduced a technique for the representation of an entity based on 'states'. We noted the need for a state which represented the identity of an entity which was independent of its properties. From that starting point our problem is to find good ways of representing the various properties which an entity might have. We based the representation of the properties on the perceptions which a human might have of them. Each state was provided with a unique identifier and a time-stamp. The structure which results is illustrated in Figure 23.1.




An entity has several very obvious properties. For example it has a location in space, it may consist of several sub-parts, and it may have certain functional roles to play. All of these present us with particular problems for representation, and we shall consider them in turn.

23.2 Spatial Relationships

We are concerned here with the relationships between objects such as being 'near' to one another (or to the speaker), one object being 'above' another, or 'inside' another, or 'beyond' etc. We discussed in Chapter 16 the problems which can arise when we try to represent such relationships, and in Chapter 20 we introduced a method of representing 'states' based on perceptions. Specifically we introduced the notion of a 'framework' or frame of reference for a perception, which identified the sensory channel involved, the aspect of perception, the model (or coordinate system) being used, and the axis concerned.

The essence of the approach which we suggest is that a relationship such as one object being 'behind' another can be represented with reference to several frameworks at the same time. Thus 'behind' (in the sense of being 'behind a tree') is perceived visually by noting that one object is occluded from view by the other. In another framework, which uses a model with one of the objects concerned as origin, we have the notion of 'behind' meaning 'to the rear of'. In another framework, which uses the speaker as origin, 'behind' would be represented as meaning 'to the rear of the speaker' (and therefore out of sight). 'Distance' is perceived in terms of frameworks associated with texture gradients, blueness, reduced apparent size, reduced apparent sound, the possibility/impossibility of touching an object, its scent, and so on. A physical object would be endowed with a shape, size and location, whereas a fluid would have no shape, invisible gases would have no visible location, and so on. A heavy object would be associated with a significant 'heft' in the hand. A hard object would be associated with an unyielding 'feel'. These ideas could be implemented using predicates or networks of data structures, or in many other ways. The important point is that the physical properties are represented in terms of perceptual primitives rather than being represented in terms of arbitrary predicates.

Another aspect of the suggested approach is that each physical property should be represented in terms of all the associated perceptions at the same time.

This will make serious demands upon storage space and is likely to produce very cumbersome representations, but it can be argued that if distance was represented in terms of the perceptions listed above (all at the same time), then it would be possible for a natural language system to detect (in a suitable context) that the sentences 'The aeroplane flew away' and 'The aeroplane shrank and disappeared' could be construed meaning the same thing. When contextual information is necessary to make a choice between alternative interpretations, all interpretations should be created and sustained until further information makes it possible to choose. In such a system 'ambiguity' comes to be regarded simply as a lack of information.

In Chapter 16 we gave an example of the kind of difficulty which can arise if spatial relationships are represented in a simple-minded way: the relationship 'to-the-right-of' when people are sitting at a circular table. We have to ask ourselves what is the meaning of 'circular table'. It is a table which has a shape such that people sitting progressively to the right end up by being on the left of the fIrst person to sit down. 'To-the-right-of' means 'to the right with reference to some line', and the line in this case is the edge of the table (implied by 'sitting at the table'). The ideal way to deal with a problem like this is to develop some form of imagery representing the scene, and to read the positions of people directly from the image. We do not at present have the kind of computer power which makes this practical, but we might get some way to a solution by representing' to­the-right-oj in terms of the direction in which a person must turn their head to bring the next person into view.

23.3 Constituent Parts: the Anatomy of Objects

Objects usually have recognisable parts from which they are constructed. Knowledge about the anatomy of an object is necessary to understand many statements about it. For example, the statement 'She whispered in his ear' immediately informs the human reader that:

(a)  she  is  close  to  him
(b)  she  is  speaking  quitely
(c)  few  people  will  hear  what  she  says  (d)  'he'  will  hear  what  she  says



This knowledge may be necessary for an adequate understanding of a narrative, and such knowledge is assumed by any storyteller. The necessary knowledge must include the knowledge that an 'ear' is part of the human anatomy, and is the part responsible for enabling hearing.

Semantic networks have frequently been used to indicate the structural relationships between objects. We could construct a tree structure such as that illustrated below:

human  body  :-  (head,  trunk,  arms,  legs)
head  :-  (face,  hair,  neck)
trunk  :-  (chest,  back,  waist,  pelvis)
arms  :-  (left  arm,  right  arm)
arm  :-  (hand,  forearm,  elbow)
legs  :-  (left  leg,  right  leg)



We could go on to break down 'hand' into fingers and thumbs. The face would break down into eyes, ears, nose, mouth and so on. Structural information would then be needed to indicate the relative positions and roles of these constituent parts - support(legs,trunk), attached(arms,trunk), attached­(head,trunk), support(trunk,head), and so on. The approach seems plausible at first sight, but it is fraught with difficulties.

The problem is - where do we stop? Every time an entity is represented and that entity is human, do we really want to create a structure like this - expanded in detail right down to fingernails, cuticle, nostril hairs, pores, blackheads, and many other parts which we will leave the reader to imagine? This seems an excessive amount of representational junk to carry around on the remote chance that it may be needed. Even if we do so there is the chance that the understanding of a statement will require knowledge of internal anatomy. We often reter to 'stomach', 'appendix', 'kidney', 'throat', 'lungs', 'nerves' etc. If carried to its logical conclusion this approach will require a representation to rival a medical textbook on anatomy.

To avoid being overwhelmed by the sheer volume of information which appears to be required, a more economical method of representation is needed. A possible solution would represent an object by only the top level of the tree structure. For example, we might represent a human as we have done above, and resist the attempt to break the parts down any further. We might add the component 'internal-anatomy'. The structure which would result is illustrated in Figure 23.2.




The advantage of the additional record structure 'consists of' is that this state can be given a time-stamp of its own. We could therefore represent the situation which would arise if someone lost an arm. The arm would not cease to exist, but it would cease to be a constituent part of the person-entity. We could then have two 'consists of' states with time-stamps indicating the time at which the arm loss took place. Each component such as 'hand', 'arm' etc., would then be regarded as a 'macro', or the label for a database which could be expanded when required.

This suggestion is not a complete solution to the problem, however. There is still a difficulty in deciding when the expansion should take place. If we were dealing with the sentence 'John broke his finger' it would be necessary to note that 'finger' was an element of human anatomy and a part of 'hand'. The word 'hand' would have as part of its definition (or one of its many possible definitions) the information that it was often used (by humans) for picking things up. The system would then be able to deduce that John's ability to pick things up had been impaired (from the definition of 'broke').

This arrangement works well, and an appropriate expansion can be triggered if the two entities concerned are sufficiently close to one another that the overlap is detectable at top level. If, ho ever, the overlap is not obvious at top level, we have the problem of deciding whether or not it is going to be worthwhile expanding. Consider the sentence 'John broke his barometer'. Here there is no overlap which will be detectable between the short tree structures tagged on to the definition of each entity. Both are physical objects, but there the relationship stops. The real point of connection lies in the 'role' which a barometer plays (which we will discuss in the next section).

In the tree we described above, each node represents a constituent part of the entity represented by its superior node. It is also possible to construct a tree in which each node is a 'specialisation' (or specialised example of) its superior node, and at the same time each node is a 'generalisation' of its subordinate nodes. We might call it a 'consists-of-and-is-part-of' tree. Many efforts to produce a system for the handling of semantic information have been based upon tree structures of this kind. Complications abound, however. A 'leg' is a part of a person, and it is also part of a table, a chair and so on. It appears that we need several different types of 'leg' at different points in our tree structure, and the same can be true of almost every other concept.

23.4 The Classification of Objects

In addition to a 'consists-of-and-is-part-of' tree, we can also construct an 'is-a' relationship network. We might for example classify a 'spaniel' as a 'dog' and a 'dog' as an 'animal'.

In such a tree we say that each level is a 'specialisation' of its superior node, and each is a 'gene,alisation' of its subordinate nodes. Furthermore, it is often not necessary to repeat all of the properties which an entity may have at each node. If all animals are physical objects and therefore have shape, size, location etc., it can be assumed that all subordinate nodes (or specialisations) ofthe node corresponding to 'physical object' will also have these properties. This idea is known as inheritance and it is an important property of semantic nets of this kind. In the previous section we noted the need for several different kinds of 'leg'. Each of these could be considered a specialisation of a general concept 'leg' which is a support for something (unspecified). Each type of 'leg' specialises the concept by specifying the thing supported. Obviously our two types of network must intersect.

But cutting across such a structure are other possible classification structures. We might classify objects into 'edible' and 'inedible' objects, or as 'solids', 'liquids' and 'gases'. There is indeed, as we noted in section 17.6, some evidence that humans do in fact classify objects according to properties such as 'edible' and'inedible'.

For all types of classification structure we have the same dominant problem. We have potentially a very large data structure to store, almost all of which will be redundant in any given set of circumstances, and some way must be found to minimise the amount of information brought to bear on a given problem. At the same time enough information must be provided to enable the decision to be made about whether or not it would be worthwhile to expand the structure to include more information. The top level of information tagged to each entity explicitly should be regarded as 'heuristic signposts', which indicate whether or not a search through the semantic structure is likely to be fruitful.

This remains a very significant problem.

23.5 The Functional Roles of Entities

What is the meaning of the phrase 'tennis ball'? One type of defInition would stress its shape, its size, its bounciness, its being made of rubber with a felted surface, its hollowness, the curiously curved pattern of lines on its surface, and so on. On learning this defInition a person would presumably be able to recognise a tennis ball. We could give a tennis ball to the same person and they would be able to store a much more accurate set of information about its characteristics based on its appearance, weight and feel. AU these we might be able to represent by means of the perception-based representation. But would a person whose knowledge was confmed to these facts about its physical . characteristics really know what a tennis ball is?

To complete their knowledge we would have to tell them about (or better still let them see) a game of tennis. That is, we would need to inform them about the role for which a tennis ball is intended. A tennis ball can, of course, be used in a variety of ways which have nothing to do with tennis, but an understanding of its intended role is part of knowing what a tennis ball is. It is therefore necessary to include, within the representation of an object, a representation of a scenario which describes its role. This scenario must include the representation of a number of other entities. To distinguish these from the one which is the object being represented, it is necessary to mark or otherwise indicate the 'salient' element of the scenario representation. The phrase 'tennis ball' is quite a hard one to represent, and so we will leave it until the next section where we shall deal with the representation of repetitive events. Instead we will consider the representation of the word 'fertiliser'. This is usually a brown, earth-like substance which would not easily be distinguished from many other substances if its role was not known. The representation might be something like this:' ­

id=
owner  =
{S1  (a  plant)}
{S2  (soil)}
*  {S3  (fertiliser)}
{S4  (agent)}
{S5  S3  in  S2  }
{S6  S4->S5  }
{S7S1  in  S4  }
{S8  size  of  S1  increase} 
{S9  S5->S8  }



That  is,  there  exists  a  plant,  some  soil,  some  fertiliser  and  an  agent.  The  plant  is  in  the  soil. 
The  agent  places  the  fertiliser  in  the  soil,  and  this  causes  the  plant  to  grow.  .



The  state  marked  ,*,  is  the  entity  being  defined  (the  salient).



Of  course  we  have  glossed  over  a  lot  of  detail  in  this  illustration.  Each  state
would  contain  many  elements,  including  time-stamps  which  would  be  in  chronological  sequence. 
The  plant  and  the  soil  would  have  their  physical  characteristics.  We  might  indicate  that  the  plant  would  grow  in  any  case, 
but  that  the  fertiliser  would  make  it  grow  faster  and  bigger.  But  these  are  details  which  do  not  affect  the  point  being  made.



Note  that  the  same  scenario  could  be  used  to  define  the  verb  'to  fertilise',  except  that  in  that  case  the  state  S6  would  be 
identified  as  the  salient  (i.e.  the  act  of  causing  the  fertiliser  to  be  in  the  soil).  Note  also  that  'fertiliser'  is  a  noun, 
while  'to  fertilise'  is  a  verb,  and  recall  the  comments  made  (in  section  11.5)  about  the  way  case  grammar  mistakenly 
places  too  much  emphasis  on  the  case  structure  of  a  verb  and  ignores  the  information  content  which  a  noun  carries.  The  causal  connectivity  of  a  scenario  representation  is  crucial  to  an  understanding  of  roles.



23.6  Representing  Stuff



In  Chapter  16  we  gave  an  outline  of  some  of  the  major  stumbling  blocks  in  natural  language  research. 
One  of  them  was  the  representation  of  substances  (or  'stuff').



Most  current  NL  systems  make  the  convenient  assumption  that  the  things  which  populate  our  world  are  solid,
  well-formed  objects.  Locations  can  be  assigned  to  them,  they  have  shape  and  size  and  a  unique  identity. 
Computers  handle  data  in  the  form  of  discrete  chunks  or  structures,  and  it  is  therefore  natural  that  we  should
  attempt  to  model  things  in  terms  of  these  discrete  structural  units.  'Water',  however,  is  not  a  discrete  object; 
it  is  a  material  or  substance  of  which  other  things  may  be  made.  The  same  is  true  for  'wood',  'steel',  'air',  etc. 
Trees  are  made  of  wood,  so  are  tables  and  chairs.  The  atmosphere  is  made  of  air,  and  the  Atlantic  Ocean  is 
made  of  water,  as  is  the  River  Nile.



It  is  first  necessary  to  note  that  it  is  impossible  to  think  about  a  substance  of  any  kind  without  thinking  of 
something  made  of  that  substance.  It  may  not  be  an  identifiable  object  with  a  well-known  shape  and  a  name, 
but  any  mass  of  substance  is  an  entity  of  a  kind.  Furthermore,  the  properties  possessed  by  a  substance  are 
the  properties  which  it  bestows  on  the  entities  which  are  made  of  it.



We  have  the  means  to  represent  such  entities.  We  can  represent  their  appearance,  their  feel,  their  shape 
(or  lack  of  it),  their  colour,  their  rigidity  and  so  on.  Fluids  bestow  on  any  entity  made  of  them  a  shapelessness. 
Entities  made  of  gas  have  no  fixed  size  (or  volume).  Fluid  objects  try  to  escape,  and  are  retained  by  means  of  a 
container.  Gaseous  objects  expand  and  sometimes  float  upwards.



We  can  begin  our  representation  of  a  stuff  X  by  inventing  an  entity  which  is  made  of  X.  It  need  not  have  a  name. 
It  will  normally  be  represented  by  an  anonymous  identifier.



{Sl  entity  X  }
*  {S2  =  S3,S4,S5,S6,...} 
{S3...  of  X}
{S4...  of  X  } 
{S5  ...  of  X  } 
{S6...  of  X  }
  etc.



where S3, S4, S5, S6 represent properties of X

Structural state S2 would be identified as the salient in this representation (the substance itself). That is, the substance is that which provides the prpperties of X. By representing the properties of this entity X we are thereby representing the properties of the substance.

If we have available the computational means to process large arrays of similar elements efficiently, we could represent the object as a conjunction of a multitude of discrete elements (the state S2). Each element would have certain properties which would mimic the behaviour of the substance. Elements of water would always fall if they were not supported by either a solid object or another element of water. Thus water would always adopt the shape of its container. In a solid substance the individual elements would adhere to one another and preserve a basic shape.

In addition to these characteristics we could represent the colour, texture, etc. Within the same representation we might describe the role of the substance. Air is breathed, water is drunk, wood is burned. These roles would be represented as scenarios in which things are consumed by fire, for example. We do not want a simple predicate 'cnLbum(wood)'. We should also represent where some of these substances come from. Wood comes from trees, for example. We need, then, a scenario describing the growing of trees.

All this information is an essential part of our understanding of a statement such as 'I need air!' But the complexity of the information required, and the volume of the corresponding representational structures, are such that some means is required of ensuring that the full expansion of the structure is only carried out when necessary.