ARTIFICIAL NEURAL NETWORKS

Adapted From A Paper Written By Dr. Robert E. Uhrig

INTRODUCTION TO NEURAL NETWORKS

Neural networks have emerged during the past decade from an obscure field that had been discredited by perceived inadequacies into one of the fastest growing technologies in information processing. In the past five years, several books have been written that present the theory and general application of neural networks in a lucid manner: Wasserman, 1989; Caudill and Butler, 1989 and 1992: Hecht-Nielsen, 1990: Maren, Harston and Pap, 1990; Simpson, 1990; and Nelson and Illingsworth, 1990. These books and the extensive body of conference and periodic technical literature: Proceedings of the International Neural Network Society Meetings in 1987 through 1994; Journal of Neural Computing; Neural Networks; IEEE Transactions for Neural Networks; and Neural Computing; cover the fundamental concepts and technology of neural networks and their potential application to numerous fields.

A network of artificial neurons, usually called an artificial neural network (ANN), is a data processing system consisting of a large number of simple, highly interconnected processing elements in an architecture inspired by the structure of the cerebral cortex portion of the brain. hence, neural networks are often capable of doing things which humans or animals do well but which conventional computers often do poorly. Neural networks exhibit characteristics and capabilities not provided by any other technology.

Neurons. The human brain is a complex computing system, capable of thinking, remembering, storing complex patterns, and solving problems. The brain contains approximately 100 billion neurons (fundamental cellular units of the brain's nervous system) that are densely interconnected with hundreds--perhaps thousands--of connections per neuron. A neuron is a simple processing unit, receiving and combining signals from other neurons through input paths called dendrites. If the combined signals from all the dendrites are strong enough, the neuron "fires", producing an output signal along a path called the axon. The axon splits, connecting to hundreds or thousands of dendrites (input paths) of other neurons through synapses (junctions containing a neurotransmitter fluid controlling the flow of signals) located in the dendrites. Transmission of the signals across the synapses are electro-chemical in nature, and the magnitudes of the signals depend upon the synaptic strengths of the synapses. The strength or conductance (the inverse of resistance) of a synaptic junction is modified as the brain "learns." The synapses are the basic "memory units" of the brain, and adjustment of these synapses constitutes learning.

Computer Simulation of Artificial Neurons. Computer simulation of brain functions usually takes the form of a network of artificial neurons, usually called processing elements (PE's), but sometimes called nodes, neurons or neurodes. These PEs are analogous to the neuron in that they have many inputs (dendrites) and combine (sum up) the values of the inputs, adjusted by their weights (synaptic strengths). This sum is then subjected to a nonlinear filter, often called a transfer function, that controls the output in accordance witht he prescribed nonlinear relationship. If the transfer function is a threshold function, output signals are generated only if the sum of the weighted inputs exceeds the threshold value. If the transfer function is a continuous nonlinear (or linear) relationship, the output is a continuous function of the combined input. The most commonly used transfer function is the sigmoid function which changes smoothly from zero (for large negative values) to one (for large positive values) with a value of 1/2 for zero input. The output axon of a PE branches out and becomes the input to many other processing elements. These signals pass through connection weights (synaptic junctions) that correspond to the synaptic strength of the neural connections. The input signals to a processing element are modified by the connection weights prior to being summed by the processing element.

Artificial Neural Networks (ANNs). An artificial neural network (ANN) can be defined as a computer processing system consisting of many processing elements joined together in a structure inspired by the cerebral cortex of the brain. These processing elements are usually organized in a sequence of layers, with full connections between layers. Typically, there are three (or more) layers: an input layer where data are presented to the network through an input buffer, an output layer with a buffer that holds the output response to a given input, and one or more intermediate or "hidden" layers. The operation of an artificial neural network involves two processes: learning and recall. Learning is the process of adapting the connection weights in response to external stimuli presented at the input buffer. The network "learns" in accordance with a learning rule governing the adjustment of connection weights in response to learning examples applied at the input and output buffers. Recall is the process of accepting an input and producing a response determined by the geometry and synaptic weights of the network.

Supervised Learning. Several different kinds of learning commonly are used with ANNs. Perhaps the most common is the so-called supervised learning. Before the learning process begins all the weights are set to small random values. Then a training input is applied to the input layer, it propagates through the network and produces an output. This output is compared with the desired output to produce an error signal, which, in turn, is the input to the weight adjusting process. Various learning algorithms (discussed below) are used to adjust the weights incrementally. When the input is applied again to the input buffer, it produces an incrementally different output, which again is compared with the desired output and again produces a second error signal. This iterative process continues until the output of the artificial neural network is substantially equal to the desired output, and the error approaches zero or an irreducible minimum. At this point, the network is said to have been "trained". Through the various learning algorithms, the network gradually configures itself to achieve the desired input-output relationship or "mapping." Supervised learning is often implemented with a technique called backpropagation, the most common method of learning; however, unsupervised learning, as implemented in Kohonen networks, is also used in many situations.

LEARNING ALGORITHMS

The most common learning algorithms are:

Hebbian learning. Hebbian learning occurs when a connection weight on an imput path to a PE is incremented if both the input is high (large) and the desired output is high. This is analogous to the biological process in which a neural pathway is strengthened each time it is used.

Delta-rule learning. Delta-rule learning (sometimes called mean square error learning) occurs when the error signal (difference between the desired output and the actual output) is minimized using a least-squares process. Backpropagation is the most common implementation of Delta-rule learning and probably is used in at least 75% of ANN applications.

Competitive learning. Competitive learning occurs when the processing elements compete; only the processing element yielding the strongest response to a given input can modify itself, becoming more like the input. In all cases, the final values of the weighting functions constitute the "memory" of the ANN.

CHARACTERISTICS OF ANNs

In the recall process, an ANN accepts a signal presented at the input buffer, then produces at the output buffer a response that has been determined by the "training" of the network. The simplest form of recall occurs when there are no feedback connections between layers or within a layer (i.e., the signals flow from the input buffer to the output buffer in a process called "feed forward" information flow). In this type of network the response is produced in one computer cycle. When ANNs do have feedback connections, the signal reverberates around the network, across or within layers, until some convergence criteria have been met and a steady-state signal can be presented to the output buffers.

The characteristics that make ANN systems different from traditional computing and artificial intelligence are: 1) learning by example, 2) distributed associative memory, 3) fault tolerance, and 4) pattern recognition.

Distributive Associative Memory. The memory of an ANN is both distributive and associative. "Distributive" means that the storage of a unit of knowledge is distributed across many memory units (connection weights) in the network and shares these memory units with all other items of knowledge stored in the network will choose the closest match, in a least squares sense, to that input in its memory and will generate an output that corresponds to the full output.

Fault Tolerance. Traditional computer systems are rendered useless by any memory damage; however, neural-computing systems are fault tolerant in that if some PEs are destroyed or disabled or have their connections altered incorrectly, the behavior of the network is changed only slightly. As more processing elements are destroyed performance degrades gradually; i.e., the network performance suffers, but the system does not fail catastrophically. This behavior is possible because the information is not contained in any single memory unit, but is distributed among many connection weights of the network. Such arrangements are well-suited for systems where failure may introduce difficult problems or be unacceptable (e.g., in nuclear power plants, missile guidance, and space probes).

Pattern Recognition. Neural Networks have the ability to match large amounts of input information simultaneously and generate a categorical or generalized output. It requires that the network provide a reasonable response to noisy or incomplete inputs. Experience shows that ANNs are very good pattern recognizers. They have the ability to learn and build unique structures for a particular problem.

Neural Computing and Applications. Neural-Computing Networks consist of interconnected units that act on data instantly in a massively parallel manner. This action provides an approach that is closer to human perception and recognition than conventional computer techniques and can produce reasonable results with noisy or incomplete inputs. Unfortunately, implementation of ANNs on a digital computer requires that the computations take place serially, thereby slowing down the computation significantly. Recent developments make parallel operation of ANNs embedded in microchips a practical and readily implementable option.

Complex System Modeling. A system with multiple inputs and outputs can be modeled using an ANN by applying the system inputs to the network and using the system outputs as the desired network responses. After an appropriate number of iterative learning cycles to limit the overall error, the ANN then constitutes a non-structured, non-algorithmic model of the process involved. Such modeling can be used on physical systems, business and financial systems, or social systems. Current applications include the use of an ANN to determine whether loan applications should be approved using the previous five years experience of that bank as the input training data.

Miscellaneous Applications of Neural Networks. Today, there are literally tens of thousands of applications of ANNs in hundreds of fields. Listed below are several early applications where ANNs have been used successfully.

Image (data) compression involves the transformation of image data to a different representation requiring less memory. The image must then be reconstructed from this new representation in such a way that the difference from the original is imperceptible. Compression ratios of ten to one are common, and ratios of several hundred to one have been achieved in special applications.

Character recognition, a special kind of pattern recognition, is the process of visually interpreting and classifying symbols. ANNs were the first systems to read Japanese Kanji characters efficiently, effectively breaking the input barrier for computers used in Japan.

Handwriting recognition involves a neural computing system which accepts handwriting on a digitized pad as a computer input and which is trained by interpreting a set of handwriting types. The system can then interpret a type of handwriting it has never seen before and can make a "best guess" when confronted by oddly formed letters. Accuracy improves when the training is on the type of writing being read (e.g., on one individual's handwriting).

Target classification. ANNs have been used to classify sonar targets by distinguishing between large metal cylinders and rocks of similar size. The ANN integrates 60 spectral energy values produced from 60 frequency bands. Its performance was comparable to the best trained human operators using the same data, and significantly better than a ordinary operator or one using any other type of computer-based classifier.

Noise filtering. ANNs are able to filter noisy data, preserving a greater depth of structure and detail than any of the traditional filters, while still removing the noise. Applications include removal of background noise from voice communications (e.g., in small aircraft) and separation of the fetal heart beat from a mother's heart beat.

Servo-control systems. Complex mechanical servo-systems, such as those used in robots, must compensate for physical variations in the system introduced by misalignments in the axes, or deviation in members due to bending and stretching induced by loads. These quantities are extremely difficult to describe analytically, but ANNs can be trained to predict and respond to these errors in the final position of a robot member. This information is then combined with the desired position to provide an adaptive position correction and improve the accuracy of the member's position.

Text-to-speech conversion. In this application, the printed symbols or letters in a test were converted into spoken language using an ANN that taught itself to translate written test into speech in the same way a human child learns to read. The printed transcript is broken down into the fundamental components of speech called "phonemes" which become the desired output of the ANN when the input is the corresponding test. After training, the phonemes become the input to a voice synthesizer which provides the verbal output.