Part 1 : TITLE PAGE | Preface | What is Consciousness? | Outline of the system Part 2 : Building bricks | Layer-1 | Layer-2 | Layer-3 | Layer-4 | Layer-5 Part 3 : Discussion | Arguments | Conclusions | Addenda Tartan Hen Publications : Home | more books | Contact : feedback@tartanhen.co.uk SHAPE RECOGNITIONNOTE What I am trying to do here, is to de-mystify the mechanism of perception. I also want to alert the reader, who is unfamiliar with these things, to the techical problems involved. What I am NOT trying to do, however, is to provide commercially viable solutions. Now-a-days you can find on the Internet, all kinds of clever software systems for the recognition of faces or fingerprints and the like. You can even download trial versions of the software to test it on your own computer. Some research laboratories are experimenting with very advanced parallel computing machinery - sometimes called "connection machines" - for the recognition of scenes and objects where the image quality is poor. In general, the problems of perception have been solved for what we might call "well behaved" images. By that I mean images which do not have messy backgrounds, partial occlusions, or highlights and shadows. My understanding is that these additional problems are under active investigation and that progress is being made. In years past, research workers were happy to publish their algorithms. Unfortubately, from my point of view, that is no longer the case. There is now a tendency to be very secretive about practical software solutions. Commercial confidentiality is the refrain. You can download some stuff in its executable form, but do not ask how it actually works. Some techniques, however, are well established. For a long time people have used fast fourier transforms as the basis of shape recognition (I'll have soomething to say about that later). I read also about Radial Basis Functions, which are apparently a very fast computational technique for matching shapes to standard shapes and patterns, based on matrix algebra and similar in approach to the technique of fitting curves to specified formulae by the method called "least squares". Radial Basis Functions do a similar trick for two dimensional shapes. But as I said in the first sentence above, my aim is simply to de-mystify. I want to dispel any lingering tendency to think that the brain has magical resources at its disposal which lie beyond the capabilities of artificial inanimate hardware. So I am going to describe some techniques, which, though they may not be practical in the commercial sense, are practical in the sense that they actually work. I know that, because I've written computer programs which demonstrate that. Sometimes I used very small (postage stamp sized) images, and sometimes I had to sit and twiddle my thumbs for long and weary, before I saw any results. But the techniques do work. So here goes - An elementary technique for 2-dimensional shape recognition This is a homespun technique. It is slow and limited, but it should be somewhat easier to follow than any of the advanced mathematical techniques (like Fourier transforms) which are often used in shape recognition. The mathematics involved in m y system does not demand anything beyond a bit of school-level vector algebra.
The first step is to draw the line A-B (as shown above) followed by the lines A-C, B-D and A-E. Now calculate the length of the line C-D. You can do that by subtracting the value of C from the value of D. If A-B had been drawn in the opposite direction, with the arrowhead going the other way, you would have subtracted the value of D from the value of C. That would have given a negative result (and that’s important). Now calculate (numerically) the area under the line A-B. Do that by calculating the area of the rectangle A-E-D-C and then calculate the area of the triangle A-B-E. The area of a triangle is the half the area of the rectangle with the same base and height. Add those two areas and you will have the area under the line A-B. If the line A-B slopes down to the right, then you will need to subtract the area of the triangle from the area of the rectangle.
Now draw a complete shape like this …
The calculation of area under the line can now be repeated for each line segment. As the boundary line turns round and begins to go in the opposite direction the area underneath changes from positive to negative. If you add all of these areas together, the positive and negative areas will, to an extent, cancel one another out. What will be left is the area INSIDE the shape. It will be either negative or positive depending upon the direction of the arrows. If the arrows go in a clockwise direction the area will be positive. But that does not matter. The direction of the arrows was arbitrary anyway. If the area is negative, just change it to be positive. It’s the size of that area which matters. Call it TOTAL-AREA. The next bit of the trick is to draw two vertical lines on either side of the shape and a third which is half way between them, like this …
The area of each part of the figure (on each side of that dividing line) can now be calculated. Call these AREA-1 and AREA-2. Calculate the ratios AREA-1 / TOTAL-AREA. AREA-2 / TOTAL-AREA Rotate the page through 45 degrees and through 90 degrees. Repeat the process at each angle to produce a series of ratios. This series is a kind of fingerprint. A regular shape like a circle or a square, will produce the fingerprint of ((0.5, 0.5), (0.5, 0.5), (0.5, 0.5)). Any departure will indicate some irregularity - i.e. the central line does not pass through the point of balance. To discriminate more precisely, divide the shape into thirds instead of halves and repeat the process again to create a more complex fingerprint. For an even greater degree of discrimination divide it into quarters, fifths, sixths and so on. We could also calculate the fingerprints at, say, 5 degree intervals of rotation. If we carry the procedure far enough we will have a series of these fingerprints which together identify unambiguously, virtually any shape. The technique has other interesting properties. Two shapes which are similar (but not identical) will produce similar fingerprints. We could take a set of standard shapes (square, rectangle, circle, lozenge, diamond, horseshoe, doughnut, pear-shape, dumbbell etc) and calculate the fingerprint of each. It would then be relatively easy to “place” any arbitrary shape, by calculating its proximity to each of those standard shapes. It can also indicate the orientation of the shape recognised. The method is not recommended for practical shape recognition systems but to show that it does work, I have written a computer program which operates that way. A sample of the shapes on which it was tested are illustrated below.
Which okay as far as it goes, but it doesn't go very far. What happens, for example, if it is trying to recognise a hawk, and the hawk happens to be flapping its wings, or it happens to see the hawk with its wings closed, or ffrom a side-view? The simple match with standard shapes would not work unless it had a set of standard hawk-shapes representing different views. In defence of the approach I would say this, however. A small creature which was trying to escape the clutches of hawks, but which could recognise a hawk only if the hawk-shape was presented to it as a single kind of standard shape, would still survive more often than a creature which could not recognise a hawk in any way at all. Admittedly, it would get eaten (say) 90 percent of the time. But it would survive that extra 10 percent more often, and so it would survive to produce its progeny, a little more often. When we are trying to solve a problem, it is often best to creep up on it, nibbling bits off as we go along. Evolution works in a similar way. Each advantage gained, needs only to be a marginal improvement to provide a survival 'edge' on its neighbours. And so the system will improve by acquiring several different kinds of hawk-shape. With each new acquisition, the creature would survive a little better than its neighbours. Thus would the brains of these creatures evolve in gradual steps. Fourier Transforms, as I mentioned earlier, is an established mathematical technique which has been widely used for processing images. It characterises a pattern, in terms of 'frequencies'. In effect it calculates the contribution which a given frequency makes to a shape. The diagram below illustrates.
In that diagram, (1) and (2) represent two wave forms with different frequencies and amplitudes. If you add them together, then, for a single wave-form you get something like the one shown as (3). The blue areas add together, and the red area is subtracted from the blue to form that valley in the middle. By introduciing many more wave forms with different frequencies and amplitudes you can produce a single blob with just about any shape required. Fourier analysis is just a mathementical technique which tells us what contribution (amplitude) each one of a whole series of wave-forms makes to a given shape. The more wave-forms included in the analysis the more accurate the fit will be. It is easily seen that the simple shape analyisis I described earlier is broadly similar to a fourier analysis. By dividing the area into to finer and finer sub-divisions, one is, in effect, carrying the analysis to higher and higher frequencies. The next significant step, however, is to acquire the ability to recognise edges. And that is a different story. To continue, click on PROCESSING EDGES Part 1 : TITLE PAGE | Preface | What is Consciousness? | Outline of the system Part 2 : Building bricks | Layer-1 | Layer-2 | Layer-3 | Layer-4 | Layer-5 Part 3 : Discussion | Arguments | Conclusions | Addenda Tartan Hen Publications : Home | more books | Contact : feedback@tartanhen.co.uk Copyright © Hugh Noble (Nov 2006) |