AN EXPEDIENT MIND
The development of a mechanism for concept formation

Part 1 : TITLE PAGE | Preface | What is Consciousness? | Outline of the system
Part 2 : Building bricks | Layer-1 | Layer-2 | Layer-3 | Layer-4 | Layer-5
Part 3 : Discussion | Arguments | Conclusions | Addenda
Tartan Hen Publications : Home | more books | Contact : feedback@tartanhen.co.uk


Layer-3: Concept Formation

Storage problems
          Layer-3 has available to it, all the information provided by layer-1 and 2. In particular it has access to the episodic memory store.
          By selecting for storage only those #episodes which have a high priority, the storage requirement for episodic memory, is much reduced. But the storage problem is not eliminated completely. There is still a problem with undue repetition. If the system has some important experience and has that experience twice, it will store two #episodes corresponding to those two experiences. That is a wasteful use of storage space.

Compression
          Why not store the information once only and replace the second occurrence with a bookmark, which tells the system where to find the information which previously occupied that location within the memory store? That technique is called compression. Anyone who has ever used a digital camera will understand the advantage of being able to store digital images which have been compressed.
          There are many different algorithms use for compression. Some methods enable the original data to be re-constructed without loss of detail. Other methods, known as "lossy" techniques, result in a loss of some detail. But all methods share a basic strategy. Eliminate wasteful duplication. Find chunks of data which are repeated, extract these chunks, store a representative version of the chunk once, and insert bookmarks or indicators of some kind, in the locations where the chunks came from.

De-compression
          The downside of compression is the time which is required to re-assemble the data in its original form. The more highly compressed, the longer the reconstruction procedure will take. Lossy methods are usually faster and result in a higher degree of compression. But they take longer to re-construct. That suggests a compromise strategy. Recent #episodes, which are the most likely to be required for processing, should not be compressed to any great extent. Older #episodes, which are unlikely to be required in a hurry, should be progressively compressed to save more and more space, at the cost of reconstruction taking longer and longer. Compression therefore, should take place prgressively in phases, with every phase resulting in a higher degree of compression. If it is subsequently found that certain #episodes need to be re-constructed more often, the process can be reversed. Those #episodes can be de-compressed, to make them available more rapidly. If lossy compression has been used, however, the missing data cannot be recovered.

Similarity with Human Memory
          In trying to develop a memory system, I did not set out to reproduce human memory procedures. I find it remarkable, however, how the technical requirements of storage and retrieval have forced my system design to reproduce many of the features which I experience with my own memory. Older memories are harder to recall. Often when they are recalled I find that the detail is missing. Often when I try to remember an individual incident, all I can remember is a compendium of several similar previous experiences. Was it last week or the week before, that I bought carrots when I visited the supermarket? All I can remember is the generalised experience of supermarket shopping (a compressed chunk). I cannot recall the individual items purchased on each occasion (lossy compression).
          At night, when I sleep (go off-line), I dream. And when I dream, recent experiences, especially traumatic experiences, get mixed up with other memories from long ago, while my brain (presumably) sorts out the storage problem and re-indexes some of those memories, to enable rapid recall. And if I don't get enough dream-sleep? I become confused and start to forget. Even more than usual.

Concepts
          And now we come to the crucial issue. Evolution scientists call it "exaptation". Natural selection leads evolution along in small steps and each step brings with it, some small survival advantage. And then there occurs a happy accident. After some feature has evolved because of one kind of survival advantage, it turns out that there is a second, and perhaps even more significant advantage.
          It can, of course, go the other way. Those small steps and small advantages lead the evolutionary path to the edge of a cliff and the species is eliminated. Sabre toothed tigers developed larger and larger teeth because that enabled them to penetrate the thicker and thicker skin of their prey. When the prey died out, however, those huge teeth became an encumbrance when the tiger tried to catch smaller and fleeter prey.
          But unexpected good fortune also occurs, and then we find a species with a very useful characteristic, which was developed under pressure of natural selection, for quite different reasons. And that, I suggest, is what could explain the development of concepts. Consider those chunks of data, which have been extracted from the episodic memory, to compress the store. Think about what each chunk represents, and how it could arise. Every time my dog walks into the room, I experience a set of repeated experiences. I see the dog's coat, his four legs, his floppy ears and his lolling tongue. I also smell him and sometimes I hear him bark. So if my brain did this compression trick, those experiences would certainly be candidates for extraction. And if I put all those experiences together into one chunk, that chunk is a compendium of everything I associate with my dog. It becomes the foundation, for the concept in my brain, which I call "Fido".

Concept Structure
          Consider again, how those repeating chunks of experience are formed and where they come from. Episodic memory is a collection of #episodes. Each #episode is a recorded #trace leading up to some event of importance. A #trace is a chonological sequence of #states. Each #state is a set of #perceptions all of which occurred during the same time interval. A #perception is a unit of recognition - a feature within the in-coming sensory #signals, which can be identified by matching them to a standard set of relationships. To identify a repeating chunk, the compression algorithm must compare two #episodes and try to find within them, a set of similar #perceptions. The criteria for a similarity to be declared, can vary. Every #perception has a priority level (set by layer-1). The system could, for example, impose a lower limit on the priority level which was considered significant. Every #perception with a priority less than that level would be ignored. Only those above would be compared. The way in which #perceptions could be compared, is also subject to variation. Just how similar do they have to be, before a match is declared? These are difficult questions to answer.
          We also have to consider how many #states are involved in each comparison. A comparison could be made #state by #state. That is, the system could compare two #states and look for a common set of #perceptions. Alternatively, the system could compare two pairs of #states. If the first pair is denoted (A,B) and the second pair (X,Y), then the system would look first to see if A matched X and then to see if B matched Y. If both comparisons were found to produce a match, the pair of #states (A,B) would be declared to be a repeating chunk. The same kind of comparison could be made using short sequences of #states. The diagram below illustrates these various alternatives.

FIG: The diagram is in two parts. Each part shows a section of episodic memory with five #episodes in each. Part (1) shows a number of single #states which match and can therefore be identified as repeating chunks. Part (2) shows a number of repeating pairs. Part (3) shows a number of repeating multi-state chunks.

Entities, Causal-Links and Scenarios
          This gives rise to a three-way classification of concepts.

(1) Those which are derived from single #states can be regarded as "entities". In the simplest case these will be physical objects, of which "FIDO" is a prototypical example.

(2) Those which have been extracted from repeating pairs of chronologically adjacent #states. I will call these "causal-links".

(3) Those which have been extracted from a chronological chain of #states. These will be called "scenarios".

          I will consider these separately.

ENTITIES
          An entity is a concept which has a set of properties, all of which can be perceived at a single moment in time. We shall see shortly, that that definition holds only for the initial phase of concept development. Later, the notion of an entity will be extended to include properties which can be observed only from different viewpoints, so that the complete collection of properties cannot be observed at a single moment. But initally, as the concept is being formed, that simplistic definition holds. A typical entity is a physical object (like "FIDO") which is either present or not.

Turning Chunks of Experience into a Concept
          A simple chunk of experience is the foundation on which a concept can be constructed, but it is not, in that form a concept. A concept has its own identity. A chunk of experience occurs at a particular moment in time and then it is gone. A concept is timeless. It persists. It can persist, long after the physical object which it represents no longer exists. So to convert a chunk of experience into a concept, we must encapsulate that experience within a new structure. The diagram below illustrates.

FIG: The diagram shows the structure of an entity. Not that it encapsulates a set of properties (that is, #perceptions P1, P2, P3, etc). Initially these will be confined to a single time frame - the time interval associated with a single #state. But because we have a new structure with its own identity, we can then extend the properties of the entity, to include other properties which go well beyond a single time frame.

          Initially entities will be associated with particular observations. But if the compression algorithm is applied and re-applied to repeating chunks already identified, and if each application of the algorithm relaxes the criteria for a match, the result will be successive generalisations of the entity. If we again take "FIDO" as the prototype, we can see that initially there will be several occurrences of FIDO. FIDO wagging his tail. FIDO asleep. FIDO playing with a ball. FIDO from the rear. FIDO from the front. When these are compared, the common element FIDO can be extracted. Other compression objects may also have been formed corresponding to ROVER, PLUTO etc When the compression algorithm is app,lied to these as a group, a common set of properties will be discovered. The result will be a common chunk which we can call "DOG". In this way, a hierarchy of entities can be constructed with each successive layer representing a more highly generalised version of the entities in the layer below it. In computer science that is called an inheritance hierarchy. Each item in the hierarchy is said to "inherit" some of its properties from the layer above. Thus all of the individual entities FIDO, ROVER and PLUTO would inherit the ownership of four legs from the entity DOG.
          A word of caution however. It is easy to imagine a neat and tidy structure with all the properties being inherited downwards in an orderly fashion. Textbooks on object oriented programming are full of diagrams showing structures like that, and they can be, if they are deliberately built in that way. But in my system, the structure is built contingently on sensory experience. I don't think we can expect a tidy structure to arise from the haphazard experience of real life. The structure of the concept store of entities, in my system, is more likely to be very confused and untidy. If the hierarchical structure is likened to a house, then this will be a house with many mezzanine floors, and then even more mezzanines between the other mezzanines. It is also a structure which will be in a constant state of re-organisation. It is interesting to note, however, that the mechanism does appear to be capable of deriving a structure with some quite specific entities at the lowest levels, and some very generalised concepts at the highest level.

Two-State Chunks and Causal-Links
          The next type of concept I want to consider, is formed by the identification of two #states which occur repeatedly as a chronological pair. If the pair of #states is denoted (X,Y) with X being the first in time, and if it is also true that X is found not to occur without being followed, every time, by #state Y, then we would be able to use any occurrence of X as a reliable predictor of the future occurrence of Y. We could go further and say that "X causes Y to happen".
          For some that is a step too far. They will point out that "X always precedes Y", describes nothing more than a statistical correlation between X and Y. They will consider it invalid that we should jump from that observation to the conclusion that "X causes Y". However, those who are familiar with the philosophy of David Hume, will recognise that that is exactly what he proposed as the origin of our concept of causation.

"... experience only teaches us how one event constantly follows another, without instructing us in the secret connection which binds them together and renders them inseparable." [An Enquiry Concerning Human Understanding, David Hume, 1748]

          The only reservation I have about Hume's choice of example, is that is almost too commonplace, so much so, that some people find it hard to realise that a causal connection has not actually been observed. So I off an alternative example. There is a flash of lightning and the lights go out. Are these two events causally linked? Our willingness to suspect a causal connection depends up the timing of the two events and the relative frequency of each. If we live in an area where the lights fail often at all times of day, whether there is a thunder storm or not, then we would not find it so easy to "see" a causal connection. But if both the lightning and the failure of the lights are relatively uncommon events, then the temptation to hypothesise a causal connection becomes very hard to resist, even although a causal connection has not really been observed.
          Although it is my intention to deal with philosophical issues at the end of this book, I think it is worth considering this point here, and at some length, because my entire thesis hinges upon it.

David Hume
          I find myself (embarrassingly) supporting two positions.

Position 1: As someone who has taugh statistics to undergraduates, I have often found myself insisting that the assumption that correlation implies causation, is entirely invalid. There is a demonstrable co-relation between the quality of a child's handwriting and the size of that child's big toe. Some find that startling but the reason is prosaic. In estabishing that finding we have overlooked the child's age. As a child grows to maturity, his or her handwriting can be expected to improve while at the same time the size of the child's big toe will increase. The two co-relates are simply the result of a common single cause.

Position 2: If Hume is correct, a causal connection between events is nothing more than an observed correlation. When we say "X causes Y" all we are actually saying is that there appears to be some link between the two events such that when we observe X we expect Y to follow and we do so with very great confidence.

          That final phrase "with very great confidence" is important. It is the only way I can reconcile these two positions. It means that a causal connection is DEFINED as being a co-relation which induces virtual certainty of prediction. Most co-relations do not do that. There is, for example, no certainty associated with the big toe/handwriting correlation because there is a very considerable degree of variation.
          The example which Hume used to illustrate his view of causation, was the impact of one billiard ball on another. He pointed out that no matter how closely we observe the impact, we cannot actually see why the second ball rolls away when the first one hits it. We all expect that to happen. Indeed, the observation is so commonplace, that many will be surprised by the suggestion that there is no observable causal connection. But note this, I could arrange for things to happen in some strange and unexpected ways. I could for example prepare a special billiard ball which contains a gyroscope inside it. When the other ball hits it, this special ball rolls off in an entirely new kind of way. It will spin and the spinning will probably make it swerve. I could also secure the second billiard ball to the table by means of a hidden bolt. Or I could put iron inside and magnetise the balls so that they stick together. But you will say "Yes, but the causal connection is still there. The first one is trying to make the second one move and it is only these special extra factors that prevent that from happening." Now we will fly off to the far end of the universe and try again. Are you quite sure that in this strange place, the laws of physics are the same as the are on Earth. If all the billiard balls in this place acted like the magnetised ones, would you not be prepared to change your expectations?
          The point is this - we do not actually SEE a causal connection. We assume that it is there, and we base that judgement on the observation that these events seem to be absolutely reliable and repeatable.. An infant looks out on the world with naivity and sees how it behaves. It learns to expect the behaviour of the world to remain constant and predictable. It does not actually see causal connections. It forms the concept of causal connection and imposes that concept upon the world in order to render the world predictable.
          We could do the scientific thing. We could don a white coat and examine the impact of billiard balls in great detail. We could then re-express the observed events in terms of atomic forces in collision at the point of impact. Note, however, that the whole concept of a force, is just another example of the same concept. We observe the juxtaposition of events, and we then conceive of a thing we call "a force" to be an explanation of the causal connection. But we never actually see a force - like magnetism, gravity or whatever. All we actually see is the way (we suppose) it affect the behaviour of physical objects.
          Some have tried to refute Hume's insight by mis-interpreting his philosophy. They say that Hume suggested that, in nature, there is no constraining connection between events, and that we are just imagining that there are connections. That is not what Hume said. What he said was that even if there are connections between events in nature, we cannot observe them. We suppose that there is a form of connection, and the causal link is our way of expressing that supposition.
          According to Russell, Hume position cannot be refuted. It is also clear that Russell wanted Hume to be refuted.

"He represents, in a certain sense, a dead end: in his direction, it is impossible to go further. To refute him has been, ever since he wrote, a favourite pastime among metaphysicians. For my part, I find none of their refutations convincing: nevertheless, I cannot but hope that something less sceptical than Hume's system may be discoverable." [History of Western Philosophy, Bertrand Russell, 1946]

          It is, I admit, with a certain amount of mischievous glee, that I now use Hume's "dead end" as a launching platform for my own thesis. I do not think he was a dead end. I think he was dead right.

Representing a Causal Link
          To represent a causal link, we need to give it an identity of its own, which is distinct from the identities of the two #states which participate in it. The diagram below illustrates.


FIG: The diagram shows a single structure (a causal link). This structure contains two others - the #states S1 and S2. It also has its own unique identifier (C1). It is this encapsulation of the two #states, which turns what was simply a two-state chunk of data being compressed, into a concept structure.

Generalising the Causal Connection
          By analysing the stored memory of past experience, my system is able to extract those repeated chunks - the two-state sequences, which seem to indicate a causal connection. In that way it would be able to develop the concept of one billiard ball causing the movement of another. It would in the same way develop a host of other causal links. The causal link between the clapping of hands and hearing a sound. The causal link between pricking the finger with a pin and the sensation of pain. And so on.
          If the compression algorithm is then re-applied to those concepts. what we will get is a generalisation of these individual causal links. All these billiard balls, and other objects banging into one another, will generalise to the concept of "impact". All these pin pricks, burnings, slaps and knocks, will generalise to the concept "inflict pain". And if the compression algorithm is again re-applied to those generalised concepts created by the first re-application, the ultimate generalisation will be the concept of "CAUSAL LINK". A CAUSAL LINK would be represented by a two-state structure, where neither #state, in the linked pair, has any detailed content at all which relates to a particular event. The only thing indicated by that generalised structures is that one #state is a reliable predictor of the second.

Progressive Compression
          The development of the compression algorithms capable of doing the task I am describing here, would not be an easy undertaking. A slow and gradual evolution is required. In nature it would take millions of years. Creating an artificial system to simulate that piecemeal development would require an extraordinary commitment of resources.
          The task of identifying single-state chunks of repeating material would be less onerous than the problem of identifying chunks with more than one #state. It is also the case that the difficulty of identifying multiple-state chunks would be reduced somewhat if those single-state chunks had already been identified and replaced by standardised bookmarks which were easy to match. So the development process has to be progressive. Single-state chunks first, then two-state chunks and finally, multi-state chunks.

Multi-state Chunks
          One again we need to encapsulate the (multi-state) chunk of experience in a new structure which has its own identity. The diagram below illustrates.

FIG: The diagram shows a scenario structure. It encapsulates two entities and a sequence of #states, interlinked by causal links. The various #states will refer to the entity structures at the start. These entities participate throughout the scenario, and so each has a time-stamp which spans some or all of the scenario.

          A #scenario, as has been explained, begins its life as a chronological sequence of #states. That structure is then encapsulated into a new structure with its own unique identity. Once it is in that form we can call it a concept, and we can start to augment the components which are stored there. Within several of these #states, individual #entities will have been recognised and replaced by a "bookmark" representing the associated #entity concept. These need appear only once in the #scenario. They appear at the top of the #scenario structure, rather like the list of cast members in the script of a play. The causal links will also have been identified and these too will be included. The remainer is that chronological sequence of #states which can be regarded as being like the script of a small playlet. Repeated re-applications of the compression algorithm may then identify further causal-links, within which other causal-links can appear as components. Thus a causal-link can be the cause, or the effect, of another higher causal-link. This ability to link causal-links themselves within further causal-links, provides the representational technique with extraordinary power to handle very complicated causal relationships between the components of a #scenario.

Generalising the #Scenarios
          Repeated application of the compression process to #entities, results in generalise #entities (or classes of #entities). Doing the same thing to #causal-links, results in the generalised concept of CAUSATION. It is pertinent to ask, what then is the result of applying the compression algorithm to #scenarios. My answer is - abstract concepts. If we apply the compression algorithm to a set of #scenarios such as - (1) purchasing groceries at a supermarket, (2) buying stock and shares and (3) buying a house, we will, by eliminating the bits which they do not have in common and keeping those which they do, we will produce a #scenario of trading (exchanging the ownership of items of value). When one of the items involved is money, the trader who parts with the money is said to be the buyer. Buried in there too, we can see the beginnings of the concept of ownership. These issues are discussed in the section on representation.

Representational Power
          Because the form of representation this makes possible, is so powerful it is also very complicated. In a previous text I tried to demonstrate this with many examples, but found that readers became confused. I have, therefore, in this text decided to transfer that analysis of representational form, to another section, which the reader can ignore at this point, and to which the reader can also return, at a later time.

Concept Recognition
          The development of concepts was made possible by compression, and the advantage which compression gifts to the system, is the ability to store and recall, larger amounts of previous experience, without exceeding the available capacity. Concepts, as I have argued, are a by-product of that increase in storage efficiency.
          As we well know, however, concepts are the means by which we humans, analyse our sensory experience and choose appropriate responses. The question which needs to be answered is this - Are these structures, which I am calling "concepts" and which are a component of my artificial system, are they the same as these concepts in our own brains and with which we are very familiar?
          My answer to that question, is that I would be very surprised if it transpired that there was any close similarity in physical structure. It may not be the case that they are formed in the same way. I do claim, however, that they play a similar role in the two brain mechanisms. I will now try to justify that claim.

Interpretation: Concepts at Work
          Imagine a scene which might confront our intellugent robot. He is standing at the roadside watching traffic passing to and fro. He observes a man trying to cross the street. To see what is going on inside the robot's brain mechanism, we intervene at a moment when the man is half way across the street and there is a car approaching.
          The robot has previously observed all of the entities involved in this tableau. In his concept store there are concept structures corresponding to the man, to the car and to the road. To understand the significance of what he is currently observing, however, he has to put those several concept together into a single structure which I will call an "#interpretation".
          First we have the concept of a road. This concept has a physical location and it has dimensions. But the structure which is held in the robot's concept store, does not provide any specific information about the location or the dimensions - except that the object is flat, or nearly flat, and that the length is indeterminate and rather long. The location too, is unspecified, but as he observes the scene, the robot is now able to provide a location and dimensions which are appropriate to this particular experience. The location is directly in front of him. The width dimension is the standard width associated with most roads. Standard measures of that kind are specified for most objects, but not in numerical terms. There is a unique identifier for each standard measure, and particular examples are specified as being greater or smaller than the standard for that kind of object. In that way the robot can avoid thinking that a large mouse will be bigger than a small elephant. But it is not just locations and dimensions which are specified in relative terms. Everything is defined in that way. No absolute co-ordinates please. No GPS systems. This #interpretation, is the world according to the robot, as he sees it, without the aid of measuring instruments or arbitrary scales.
          So the road is place in front of him, the width stretches away from him to the other side, and the length disappears left and right to unspecified locations further than he can see. Now the man who is trying to cross the road is placed in the middle of the road. The robot will have a concept structure representing a typical man. That must be retrieved and given a location which is halfway across the road. The orientation of the man, is towards the far side of the road. The intention of the man, to cross the road, can be represented by placing a similar representational structure inside the MIND of the man, and extending it to show the man at the other side of the road, and that condition being labelled NICE (in the man's judgement). Next we have the approaching car. That too must be placed in the road, and it direction and speed represented.
          All this is now part of the #interpretation. The robot can then operate upon that #interpretation to predict a hypothetical representation of the likely future. The car moves, the man moves, and their trajectories intersect. The car is heavier than the man and elementary physics suggests the man will be severely damaged. The prediction is that that would not be NICE for the man. Or, to a lesser extent, to the robot.
          By making these predictions based on his #interpretation, the robot can understand the significance of the scene he is observing. And when I use the words "understand" and "significance" I know exactly what they mean. They mean just what I have been describiing. To "understand" is to be able to predict likely futures. To know the "significance" of events is to be able to associate these events with NICE or NASTY outcomes.
          Several things arise from this description of how the robot uses his store of concepts to construct an #interpretation based on his observations.
(1) The concepts in his memory do not contain specific information about locations or dimensions. They hold typical or standard values, in symbolic form, for some quantities. All actual measures are then provided in relative terms.
(2) When the concept is retrieved from memory, it is copied, so that the stored-version remains unaltered.
(3) After a concept has been copied, modification can be carried out to the copy-version to take account of the particular circumstances being represented. If the standard version of a dog has four legs, and the robot finds himself observing a dog with only three legs, the copy-version of the concept DOG is altered accordingly.
(4) Each stored-version of a concept which represents an animate object, will have a structure called a "MIND" associated with it. When the copy-version is formed, the robot can then place within that MIND a representation of the animate object's intentions. These intentions play a crucial role in predicting the likely outcome of a given scenario which involves animate objects.
(5) When the robot constructs an #interpretation of a scene, he is in fact building a working model of one small part of the world. And if that process is to be any assistance to him, in predicting likely outcomes, it has to be done very rapidly indeed.
(6) The requirement for speed, places certain restrictions on the interpretation process.
      (i) He cannot possibly do an interpretation for more than a very restricted part of the world around him at any given moment. That means that he has to confine the interpretation process to what we can now call his "focus of attention".
      (ii) The concepts in the concept-store must be indexed for very rapid retrieval. That means more off-line processing, more "down-time" during which the robot would be vulnerable. So this is a mechanism which is suitable only for a social being which has friends who can watch out for him while he sleeps and dreams.

And Finally ....
          That's all I want to say about concepts and interpretation at this point. The theme will be developed later and a lot of detail will be made available in the addendum. Now, I want to move on to describe the fifth and final layer of the system.


Part 1 : TITLE PAGE | Preface | What is Consciousness? | Outline of the system
Part 2 : Building bricks | Layer-1 | Layer-2 | Layer-3 | Layer-4 | Layer-5
Part 3 : Discussion | Arguments | Conclusions | Addenda
Tartan Hen Publications : Home | more books | Contact : feedback@tartanhen.co.uk



Copyright © Hugh Noble (Nov 2006)