Thursday, July 5, 2007

Architecture of Neural Network

Architecture of neural networks
Feed-forward networks
Feed-forward ANNs (figure 1) allow signals to travel one way only; from input to output. There is no feedback (loops) i.e. the output of any layer does not affect that same layer. Feed-forward ANNs tend to be straight forward networks that associate inputs with outputs. They are extensively used in pattern recognition. This type of organisation is also referred to as bottom-up or top-down.
Feedback networks
Feedback networks (figure 1) can have signals travelling in both directions by introducing loops in the network. Feedback networks are very powerful and can get extremely complicated. Feedback networks are dynamic; their 'state' is changing continuously until they reach an equilibrium point. They remain at the equilibrium point until the input changes and a new equilibrium needs to be found. Feedback architectures are also referred to as interactive or recurrent, although the latter term is often used to denote feedback connections in single-layer organisations.
Figure 4.1 An example of a simple feedforward network
Figure 4.2 An example of a complicated network
Network layers
The commonest type of artificial neural network consists of three groups, or layers, of units: a layer of "input" units is connected to a layer of "hidden" units, which is connected to a layer of "output" units. (see Figure 4.1)
The activity of the input units represents the raw information that is fed into the network.
The activity of each hidden unit is determined by the activities of the input units and the weights on the connections between the input and the hidden units.
The behaviour of the output units depends on the activity of the hidden units and the weights between the hidden and output units.
This simple type of network is interesting because the hidden units are free to construct their own representations of the input. The weights between the input and hidden units determine when each hidden unit is active, and so by modifying these weights, a hidden unit can choose what it represents.
We also distinguish single-layer and multi-layer architectures. The single-layer organisation, in which all units are connected to one another, constitutes the most general case and is of more potential computational power than hierarchically structured multi-layer organisations. In multi-layer networks, units are often numbered by layer, instead of following a global numbering.
Perceptrons
The most influential work on neural nets in the 60's went under the heading of 'perceptrons' a term coined by Frank Rosenblatt. The perceptron (figure 4.4) turns out to be an MCP model ( neuron with weighted inputs ) with some additional, fixed, pre--processing. Units labelled A1, A2, Aj , Ap are called association units and their task is to extract specific, localised featured from the input images. Perceptrons mimic the basic idea behind the mammalian visual system. They were mainly used in pattern recognition even though their capabilities extended a lot more.
Figure
In 1969 Minsky and Papert wrote a book in which they described the limitations of single layer Perceptrons. The impact that the book had was tremendous and caused a lot of neural network researchers to loose their interest. The book was very well written and showed mathematically that single layer perceptrons could not do some basic pattern recognition operations like determining the parity of a shape or determining whether a shape is connected or not. What they did not realised, until the 80's, is that given the appropriate training, multilevel perceptrons can do these operations

1 comment:

marvin Minsky said...

There is a popular rumor that the theorems in the book "Perceptrons" are outdated because some new learning schemes were developed. This is not so, because our theorems were about which patterns could be distinguished by those networks -- as functions of their sizes and numbers of connections. Evidently, those critics never understood the book: those theorems simply do not depend on how the coefficients are learned -- but only on when such coefficients exist.

As for the extensions to networks with more layers, almost nothing is yet known about this. Here is a problem I'd like to see solved:

Suppose that each neuron can accept only 100 inputs. Also, suppose that the net is "one-way" -- that is, it can have any number of layers, but does not contain any loops.

Now suppose that the input is like a human retina that has a million binary inputs. The how many neurons must be in such a network
so that it can decide (without any errors) whether more than half on its inputs are "on."

The question is not how a network could learn such a thing, but how many neurons this would take, and what must be the ratio of the smallest to largest connection-weights. (The latter will depend on the size of the net.)

Note that this is not a question about how to learn that discrimination, but whether it can be learned at all.