Huang Dong’s Blog, email: huangdongxy@hotmail.com

February 25, 2009

Biologically motivated computationally intensive approaches to image pattern recognition

Author: Nikolay Petkov

Date: 1995

Abstract:

The concerned approaches are biologically motivated, in that we try to mimic and use mechanisms employed by natural vision systems, more specifically the visual system of primates.  Visual information representations which are motivated by the function of the primary visual cortex, more specifically by the function of so-called simple cells, are computed.

Cortical filters and images

Computational models of visual neurons with linear spatial summation

The receptive field of a visual neuron is the part of the visual field within which a stimulus can influence the response of the concerned neuron.

The response r  of a neuron to an input image s(x,y) can be modelled as follows:

  1. linear spatial summation. The receptive field function is like the impulse response of a linear system.
  2. thresholding and non-linear local contrast normalization. (this part not understood!!!)

Simple cells

In this study we are concerned with computer simulations of so-called simple cells in the primary visual cortex.

Neurophysiological research has shown that, on the population of all simple cells, the receptive field sizes vary considerably with the diameters of smallest to the largest receptive fields being in a ratio of at least 1:30.

It has been found that the spatial aspect ratio vary in a very limited range of 0.23<γ<0.92. The value γ=0.5 is used in our simulations.

The ratio σ/λ determines the number of parallel excitatory and inhibitory zones which can be observed in a receptive field. Neurophysiological research shows that the paramters λ and σ are closely correlated; on the set of all cells, the ratio σ/λ which determines the spatial-frequency bandwidth of a cell varies in a very limited range of 0.4-0.9 which corresponds to two to five excitatory and inhibitory stripe zones in a receptive field. The value σ/λ=0.5 is used in our simulations.

In our simulations, we use for φ the following values: 0 (symmetric receptive fields to which we refer as ‘center-on’ in analogy with retinal ganglion cell receptive fields whose central areas are excitatory), π (symmetric receptive fields to which we refer to as ‘center-off’, since their central lobe are inhibtory) and -0.5π and 0.5π (antisymmetric receptive fields with opposite polarity).

For technical applications one might wish to have filters with a more distinct effect, in that such a filter should enhance edges or bars, but not both at the same time. This would  simplify the interpretation of the resulting cortical images and the subsequent processing steps. We propose a mechanism for the elimination of so-called ‘shadow’ lines in cortical images obtained from filters with antisymmetric spatial summation functions. An edge which is strongly enhanced by a cortical filter with an antisymmetric receptive field function of orientation θ and phase φ will give rise to a pair of parallel lines in the cortical image produced by a cortical filter with an antisymmetric receptive field function of the same orientation θ and phase φ+π. We call these lines ‘shadow’ lines and proposed to eliminate them by a mechanism we called lateral inhibitation. Roughly speaking, this mechanism acts as follows: the value of a pixel in a cortical image corresponding to orientation  θ and phase φ is set to zero, if a pixel of higher value is found in a certain vicinity of the concerned pixel in the cortical image corresponding to orientation θ and phase φ+π. Here, it is proposed to extend the lateral inhibition mechanism applying it to all cortical images with the same set of values of the receptive field function parameters, but φ.

Using cortical images

Extracting lower dimension representations

Although the sets of cortical images computed  according to the above model deliver usefully structured information, they themselves do not give an ultimate solution to the image pattern recognition problem.

All pixel values in a cortical image are summed together to build a quantity which is (partially) characteristic of the input image. (A positive aspect of this scheme is that the result does not depend on the precise position of the object to be recognized.) Since a number of different cortical channels are used, each of them computing a different cortical image from the input image,  a number of such quantities are computed, one per cortical channel. Together these quantities form a descriptor vector which is considered as a projection of the input image onto a point in a lower-dimension space. This representation is then used to discriminate among different images.

Computing optic flow

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

Create a free website or blog at WordPress.com.