Introduction to Neural Networks

Neural Processing

How do you recognize a face in a crowd? How does an economist predict the direction of interest rates? Faced with problems like these, the human brain uses a web of interconnected processing elements called neurons to process information. Each neuron is autonomous and independent; it does its work asynchronously, that is, without any synchronization to other events taking place. The two problems posed, namely recognizing a face and forecasting interest rates, have two important characteristics that distinguish them from other problems: First, the problems are complex, that is, you can’t devise a simple step-by-step algorithm or precise formula to give you an answer; and second, the data provided to resolve the problems is equally complex and may be noisy or incomplete. You could have forgotten your glasses when you’re trying to recognize that face. The economist may have at his or her disposal thousands of pieces of data that may or may not be relevant to his or her forecast on the economy and on interest rates.

The vast processing power inherent in biological neural structures has inspired the study of the structure itself for hints on organizing human-made computing structures. Artificial neural networks, the subject of this book, covers the way to organize synthetic neurons to solve the same kind of difficult, complex problems in a similar manner as we think the human brain may. This chapter will give you a sampling of the terms and nomenclature used to talk about neural networks. These terms will be covered in more depth in the chapters to follow.

Neural Network

A neural network is a computational structure inspired by the study of biological neural processing. There are many different types of neural networks, from relatively simple to very complex, just as there are many theories on how biological neural processing works. We will begin with a discussion of a layered feed-forward type of neural network and branch out to other paradigms later in this chapter and in other chapters.

A layered feed-forward neural network has layers, or subgroups of processing elements. A layer of processing elements makes independent computations on data that it receives and passes the results to another layer. The next layer may in turn make its independent computations and pass on the results to yet another layer. Finally, a subgroup of one or more processing elements determines the output from the network. Each processing element makes its computation based upon a weighted sum of its inputs. The first layer is the input layer and the last the output layer. The layers that are placed between the first and the last layers are the hidden layers. The processing elements are seen as units that are similar to the neurons in a human brain, and hence, they are referred to as cells, neuromimes, or artificial neurons. A threshold function is sometimes used to qualify the output of a neuron in the output layer. Even though our subject matter deals with artificial neurons, we will simply refer to them as neurons. Synapses between neurons are referred to as connections, which are represented by edges of a directed graph in which the nodes are the artificial neurons.

Output of a Neuron

Basically, the internal activation or raw output of a neuron in a neural network is a weighted sum of its inputs, but a threshold function is also used to determine the final value, or the output. When the output is 1, the neuron is said to fire, and when it is 0, the neuron is considered not to have fired. When a threshold function is used, different results of activations, all in the same interval of values, can cause the same final output value. This situation helps in the sense that, if precise input causes an activation of 9 and noisy input causes an activation of 10, then the output works out the same as if noise is filtered out.

To put the description of a neural network in a simple and familiar setting, let us describe an example about a daytime game show on television, The Price is Right.

Cash Register Game

A contestant in The Price is Right is sometimes asked to play the Cash Register Game. A few products are described, their prices are unknown to the contestant, and the contestant has to declare how many units of each item he or she would like to (pretend to) buy. If the total purchase does not exceed the amount specified, the contestant wins a special prize. After the contestant announces how many items of a particular product he or she wants, the price of that product is revealed, and it is rung up on the cash register. The contestant must be careful, in this case, that the total does not exceed some nominal value, to earn the associated prize. We can now cast the whole operation of this game, in terms of a neural network, called a Perceptron, as follows.

Consider each product on the shelf to be a neuron in the input layer, with its input being the unit price of that product. The cash register is the single neuron in the output layer. The only connections in the network are between each of the neurons (products displayed on the shelf) in the input layer and the output neuron (the cash register). This arrangement is usually referred to as a neuron, the cash register in this case, being an instar in neural network terminology. The contestant actually determines these connections, because when the contestant says he or she wants, say five, of a specific product, the contestant is thereby assigning a weight of 5 to the connection between that product and the cash register. The total bill for the purchases by the contestant is nothing but the weighted sum of the unit prices of the different products offered. For those items the contestant does not choose to purchase, the implicit weight assigned is 0. The application of the dollar limit to the bill is just the application of a threshold, except that the threshold value should not be exceeded for the outcome from this network to favor the contestant, winning him or her a good prize. In a Perceptron, the way the threshold works is that an output neuron is supposed to fire if its activation value exceeds the threshold value.

Weights

The weights used on the connections between different layers have much significance in the working of the neural network and the characterization of a network. The following actions are possible in a neural network:

1.  Start with one set of weights and run the network. (NO TRAINING)

2.  Start with one set of weights, run the network, and modify some or all the weights, and run the network again with the new set of weights. Repeat this process until some predetermined goal is met. (TRAINING)

Training

Since the output(s) may not be what is expected, the weights may need to be altered. Some rule then needs to be used to determine how to alter the weights. There should also be a criterion to specify when the process of successive modification of weights ceases. This process of changing the weights, or rather, updating the weights, is called training. A network in which learning is employed is said to be subjected to training. Training is an external process or regimen. Learning is the desired process that takes place internal to the network.

Feedback

If you wish to train a network so it can recognize or identify some predetermined patterns, or evaluate some function values for given arguments, it would be important to have information fed back from the output neurons to neurons in some layer before that, to enable further processing and adjustment of weights on the connections. Such feedback can be to the input layer or a layer between the input layer and the output layer, sometimes labeled the hidden layer. What is fed back is usually the error in the output, modified appropriately according to some useful paradigm. The process of feedback continues through the subsequent cycles of operation of the neural network and ceases when the training is completed.

Supervised or Unsupervised Learning

A network can be subject to supervised or unsupervised learning. The learning would be supervised if external criteria are used and matched by the network output, and if not, the learning is unsupervised. This is one broad way to divide different neural network approaches. Unsupervised approaches are also termed self-organizing. There is more interaction between neurons, typically with feedback and intralayer connections between neurons promoting self-organization.

Supervised networks are a little more straightforward to conceptualize than unsupervised networks. You apply the inputs to the supervised network along with an expected response, much like the Pavlovian conditioned stimulus and response regimen. You mold the network with stimulus-response pairs. A stock market forecaster may present economic data (the stimulus) along with metrics of stock market performance (the response) to the neural network to the present and attempt to predict the future once training is complete.

You provide unsupervised networks with only stimulus. You may, for example, want an unsupervised network to correctly classify parts from a conveyor belt into part numbers, providing an image of each part to do the classification (the stimulus). The unsupervised network in this case would act like a look-up memory that is indexed by its contents, or a Content-Addressable-Memory (CAM).

Noise

Noise is perturbation, or a deviation from the actual. A data set used to train a neural network may have inherent noise in it, or an image may have random speckles in it, for example. The response of the neural network to noise is an important factor in determining its suitability to a given application. In the process of training, you may apply a metric to your neural network to see how well the network has learned your training data. In cases where the metric stabilizes to some meaningful value, whether the value is acceptable to you or not, you say that the network converges. You may wish to introduce noise intentionally in training to find out if the network can learn in the presence of noise, and if the network can converge on noisy data.

Memory

Once you train a network on a set of data, suppose you continue training the network with new data. Will the network forget the intended training on the original set or will it remember? This is another angle that is approached by some researchers who are interested in preserving a network’s long-term memory (LTM) as well as its short-term memory (STM). Long-term memory is memory associated with learning that persists for the long term. Short-term memory is memory associated with a neural network that decays in some time interval.

Capsule of History

You marvel at the capabilities of the human brain and find its ways of processing information unknown to a large extent. You find it awesome that very complex situations are discerned at a far greater speed than what a computer can do.

Warren McCulloch and Walter Pitts formulated in 1943 a model for a nerve cell, a neuron, during their attempt to build a theory of self-organizing systems. Later, Frank Rosenblatt constructed a Perceptron, an arrangement of processing elements representing the nerve cells into a network. His network could recognize simple shapes. It was the advent of different models for different applications.

Those working in the field of artificial intelligence (AI) tried to hypothesize that you can model thought processes using some symbols and some rules with which you can transform the symbols.

A limitation to the symbolic approach is related to how knowledge is representable. A piece of information is localized, that is, available at one location, perhaps. It is not distributed over many locations. You can easily see that distributed knowledge leads to a faster and greater inferential process. Information is less prone to be damaged or lost when it is distributed than when it is localized. Distributed information processing can be fault tolerant to some degree, because there are multiple sources of knowledge to apply to a given problem. Even if one source is cut off or destroyed, other sources may still permit solution to a problem. Further, with subsequent learning, a solution may be remapped into a new organization of distributed processing elements that exclude a faulty processing element.

In neural networks, information may impact the activity of more than one neuron. Knowledge is distributed and lends itself easily to parallel computation. Indeed there are many research activities in the field of hardware design of neural network processing engines that exploit the parallelism of the neural network paradigm. Carver Mead, a pioneer in the field, has suggested analog VLSI (very large scale integration) circuit implementations of neural networks.

Neural Network Construction

There are three aspects to the construction of a neural network:

1.  Structure—the architecture and topology of the neural network

2.  Encoding—the method of changing weights

3.  Recall—the method and capacity to retrieve information

Let’s cover the first one—structure. This relates to how many layers the network should contain, and what their functions are, such as for input, for output, or for feature extraction. Structure also encompasses how interconnections are made between neurons in the network, and what their functions are.

The second aspect is encoding. Encoding refers to the paradigm used for the determination of and changing of weights on the connections between neurons. In the case of the multilayer feed-forward neural network, you initially can define weights by randomization. Subsequently, in the process of training, you can use the backpropagation algorithm, which is a means of updating weights starting from the output backwards. When you have finished training the multilayer feed-forward neural network, you are finished with encoding since weights do not change after training is completed.

Finally, recall is also an important aspect of a neural network. Recall refers to getting an expected output for a given input. If the same input as before is presented to the network, the same corresponding output as before should result. The type of recall can characterize the network as being autoassociative or heteroassociative. Autoassociation is the phenomenon of associating an input vector with itself as the output, whereas heteroassociation is that of recalling a related vector given an input vector. You have a fuzzy remembrance of a phone number. Luckily, you stored it in an autoassociative neural network. When you apply the fuzzy remembrance, you retrieve the actual phone number. This is a use of autoassociation. Now if you want the individual’s name associated with a given phone number, that would require heteroassociation. Recall is closely related to the concepts of STM and LTM introduced earlier.

The three aspects to the construction of a neural network mentioned above essentially distinguish between different neural networks and are part of their design process.

Sample Applications

One application for a neural network is pattern classification, or pattern matching. The patterns can be represented by binary digits in the discrete cases, or real numbers representing analog signals in continuous cases. Pattern classification is a form of establishing an autoassociation or heteroassociation. Recall that associating different patterns is building the type of association called heteroassociation. If you input a corrupted or modified pattern A to the neural network, and receive the true pattern A, this is termed autoassociation. What use does this provide? Remember the example given at the beginning of this chapter. In the human brain example, say you want to recall a face in a crowd and you have a hazy remembrance (input). What you want is the actual image. Autoassociation, then, is useful in recognizing or retrieving patterns with possibly incomplete information as input. What about heteroassociation? Here you associate A with B. Given A, you get B and sometimes vice versa. You could store the face of a person and retrieve it with the person’s name, for example. It’s quite common in real circumstances to do the opposite, and sometimes not so well. You recall the face of a person, but can’t place the name.

Qualifying for a Mortgage

Another sample application, which is in fact in the works by a U.S. government agency, is to devise a neural network to produce a quick response credit rating of an individual trying to qualify for a mortgage. The problem to date with the application process for a mortgage has been the staggering amount of paperwork and filing details required for each application. Once information is gathered, the response time for knowing whether or not your mortgage is approved has typically taken several weeks. All of this will change. The proposed neural network system will allow the complete application and approval process to take three hours, with approval coming in five minutes of entering all of the information required. You enter in the applicant’s employment history, salary information, credit information, and other factors and apply these to a trained neural network. The neural network, based on prior training on thousands of case histories, looks for patterns in the applicant’s profile and then produces a yes or no rating of worthiness to carry a particular mortgage. Let’s now continue our discussion of factors that distinguish neural network models from each other.

Cooperation and Competition

We will now discuss cooperation and competition. Again we start with an example feed forward neural network. If the network consists of a single input layer and an output layer consisting of a single neuron, then the set of weights for the connections between the input layer neurons and the output neuron are given in a weight vector. For three inputs and one output, this could be W = {w1, w2, w3 }. When the output layer has more than one neuron, the output is not just one value but is also a vector. In such a situation each neuron in one layer is connected to each neuron in the next layer, with weights assigned to these interconnections. Then the weights can all be given together in a two-dimensional weight matrix, which is also sometimes called a correlation matrix. When there are in-between layers such as a hidden layer or a so-called Kohonen layer or a Grossberg layer, the interconnections are made between each neuron in one layer and every neuron in the next layer, and there will be a corresponding correlation matrix. Cooperation or competition or both can be imparted between network neurons in the same layer, through the choice of the right sign of weights for the connections. Cooperation is the attempt between neurons in one neuron aiding the prospect of firing by another. Competition is the attempt between neurons to individually excel with higher output. Inhibition, a mechanism used in competition, is the attempt between neurons in one neuron decreasing the prospect of another neuron’s firing. As already stated, the vehicle for these phenomena is the connection weight. For example, a positive weight is assigned for a connection between one node and a cooperating node in that layer, while a negative weight is assigned to inhibit a competitor.

To take this idea to the connections between neurons in consecutive layers, we would assign a positive weight to the connection between one node in one layer and its nearest neighbor node in the next layer, whereas the connections with distant nodes in the other layer will get negative weights. The negative weights would indicate competition in some cases and inhibition in others. To make at least some of the discussion and the concepts a bit clearer, we preview two example neural networks (there will be more discussion of these networks in the chapters that follow): the feed-forward network and the Hopfield network.

Example—A Feed-Forward Network

A sample feed-forward network, as shown in figure A feed-forward neural network with topology 2-2-1, has five neurons arranged in three layers: two neurons (labeled x1 and x2) in layer 1, two neurons (labeled x3 and x4) in layer 2, and one neuron (labeled x5) in layer 3. There are arrows connecting the neurons together. This is the direction of information flow. A feed-forward network has information flowing forward only. Each arrow that connects neurons has a weight associated with it (like, w31 for example). You calculate the state, x, of each neuron by summing the weighted values that flow into a neuron. The state of the neuron is the output value of the neuron and remains the same until the neuron receives new information on its inputs.

A feed-forward neural network with topology 2-2-1.

For example, for x3 and x5:

x3 = w23 x2 + w13 x1
x5 = w35 x3 + w45 x4

We will formalize the equations in Chapter 7, which details one of the training algorithms for the feed-forward network called Backpropagation.

Note that you present information to this network at the leftmost nodes (layer 1) called the input layer. You can take information from any other layer in the network, but in most cases do so from the rightmost node(s), which make up the output layer. Weights are usually determined by a supervised training algorithm, where you present examples to the network and adjust weights appropriately to achieve a desired response. Once you have completed training, you can use the network without changing weights, and note the response for inputs that you apply. Note that a detail not yet shown is a nonlinear scaling function that limits the range of the weighted sum. This scaling function has the effect of clipping very large values in positive and negative directions for each neuron so that the cumulative summing that occurs across the network stays within reasonable bounds. Typical real number ranges for neuron inputs and outputs are –1 to +1 or 0 to +1. You will see more about this network and applications for it in Chapter 7. Now let us contrast this neural network with a completely different type of neural network, the Hopfield network, and present some simple applications for the Hopfield network.

Example—A Hopfield Network

The neural network we present is a Hopfield network, with a single layer. We place, in this layer, four neurons, each connected to the rest, as shown in figure Layout of a Hopfield network.. Some of the connections have a positive weight, and the rest have a negative weight. The network will be presented with two input patterns, one at a time, and it is supposed to recall them. The inputs would be binary patterns having in each component a 0 or 1. If two patterns of equal length are given and are treated as vectors, their dot product is obtained by first multiplying corresponding components together and then adding these products. Two vectors are said to be orthogonal, if their dot product is 0. The mathematics involved in computations done for neural networks include matrix multiplication, transpose of a matrix, and transpose of a vector. Also see Appendix B. The inputs (which are stable, stored patterns) to be given should be orthogonal to one another.

Layout of a Hopfield network.

The two patterns we want the network to recall are A = (1, 0, 1, 0) and B = (0, 1, 0, 1), which you can verify to be orthogonal. Recall that two vectors A and B are orthogonal if their dot product is equal to zero. This is true in this case since

     A1B1 + A2 B2 + A3B3 + A4B4 = (1x0 + 0x1 + 1x0 + 0x1) = 0

The following matrix W gives the weights on the connections in the network.

      0   -3    3   -3
     -3    0   -3    3
W =   3   -3    0   -3
     -3    3   -3    0

We need a threshold function also, and we define it as follows. The threshold value [theta] is 0.

            1  if t >= [theta]
f(t) = {
            0  if t < [theta]

We have four neurons in the only layer in this network. We need to compute the activation of each neuron as the weighted sum of its inputs. The activation at the first node is the dot product of the input vector and the first column of the weight matrix (0 -3 3 -3). We get the activation at the other nodes similarly. The output of a neuron is then calculated by evaluating the threshold function at the activation of the neuron. So if we present the input vector A, the dot product works out to 3 and f(3) = 1. Similarly, we get the dot products of the second, third, and fourth nodes to be –6, 3, and –6, respectively. The corresponding outputs therefore are 0, 1, and 0. This means that the output of the network is the vector (1, 0, 1, 0), same as the input pattern. The network has recalled the pattern as presented, or we can say that pattern A is stable, since the output is equal to the input. When B is presented, the dot product obtained at the first node is –6 and the output is 0. The outputs for the rest of the nodes taken together with the output of the first node gives (0, 1, 0, 1), which means that the network has stable recall for B also.


NOTE:  In Chapter 4, a method of determining the weight matrix for the Hopfield network given a set of input vectors is presented.


So far we have presented easy cases to the network—vectors that the Hopfield network was specifically designed (through the choice of the weight matrix) to recall. What will the network give as output if we present a pattern different from both A and B? Let C = (0, 1, 0, 0) be presented to the network. The activations would be –3, 0, –3, 3, making the outputs 0, 1, 0, 1, which means that B achieves stable recall. This is quite interesting. Suppose we did intend to input B and we made a slight error and ended up presenting C, instead. The network did what we wanted and recalled B. But why not A? To answer this, let us ask is C closer to A or B? How do we compare? We use the distance formula for two four-dimensional points. If (a, b, c, d) and (e, f, g, h) are two four-dimensional points, the distance between them is:

[radic][(a – e)2 + (b – f)2 + (c – g)2 + (d – h)2 ]

The distance between A and C is [radic]3, whereas the distance between B and C is just 1. So since B is closer in this sense, B was recalled rather than A. You may verify that if we do the same exercise with D = (0, 0, 1, 0), we will see that the network recalls A, which is closer than B to D.

Hamming Distance

When we talk about closeness of a bit pattern to another bit pattern, the Euclidean distance need not be considered. Instead, the Hamming distance can be used, which is much easier to determine, since it is the number of bit positions in which the two patterns being compared differ. Patterns being strings, the Hamming distance is more appropriate than the Euclidean distance.


NOTE:  The weight matrix W we gave in this example is not the only weight matrix that would enable the network to recall the patterns A and B correctly. You can see that if we replace each of 3 and –3 in the matrix by say, 2 and –2, respectively, the resulting matrix would also facilitate the same performance from the network. For more details, consult Chapter 4.


Asynchronous Update

The Hopfield network is a recurrent network. This means that outputs from the network are fed back as inputs.

Feedback in the Hopfield network.

The Hopfield network always stabilizes to a fixed point. There is a very important detail regarding the Hopfield network to achieve this stability. In the examples thus far, we have not had a problem getting a stable output from the network, so we have not presented this detail of network operation. This detail is the need to update the network asynchronously. This means that changes do not occur simultaneously to outputs that are fed back as inputs, but rather occur for one vector component at a time. The true operation of the Hopfield network follows the procedure below for input vector Invec and output vector Outvec:

1.  Apply an input, Invec, to the network, and initialize Outvec = Invec

2.  Start with i = 1

3.  Calculate Valuei = DotProduct ( Inveci, Columni of Weight matrix)

4.  Calculate Outveci = f(Valuei) where f is the threshold function discussed previously

5.  Update the input to the network with component Outveci

6.  Increment i, and repeat steps 3, 4, 5, and 6 until Invec = Outvec (note that when i reaches its maximum value, it is then next reset to 1 for the cycle to continue)

Now let’s see how to apply this procedure. Building on the last example, we now input E = (1, 0, 0, 1), which is at an equal distance from A and B. Without applying the asynchronous procedure above, but instead using the shortcut procedure we’ve been using so far, you would get an output F = (0, 1, 1, 0). This vector, F, as subsequent input would result in E as the output. This is incorrect since the network oscillates between two states. We have updated the entire input vector synchronously.

Now let’s apply asynchronous update. For input E, (1,0,0,1) we arrive at the following results detailed for each update step, in table:

Example of Asynchronous Update for the Hopfield Network

Column of Weight vector

Value

Outvec

notes

 

 

1001

initialization : set Outvec = Invec = Input pattern

0 -3 3 -3

-3

0001

column 1 of Outvec changed to 0

-3 0 -3 3

3

0101

column 2 of Outvec changed to 1

3 -3 0 -3

-6

0101

column 3 of Outvec stays as 0

-3 3 -3 0

3

0101

column 4 of Outvec stays as 1

0 -3 3 -3

-6

0101

column 1 stable as 0

-3 0 -3 3

3

0101

column 2 stable as 1

3 -3 0 -3

-6

0101

column 3 stable as 0

-3 3 -3 0

3

0101

column 4 stable as 1; stable recalled pattern = 0101

Binary and Bipolar Inputs

Two types of inputs that are used in neural networks are binary and bipolar inputs. We have already seen examples of binary input. Bipolar inputs have one of two values, 1 and –1. There is clearly a one-to-one mapping or correspondence between them, namely having -1 of bipolar correspond to a 0 of binary. In determining the weight matrix in some situations where binary strings are the inputs, this mapping is used, and when the output appears in bipolar values, the inverse transformation is applied to get the corresponding binary string. A simple example would be that the binary string 1 0 0 1 is mapped onto the bipolar string 1 –1 –1 1; while using the inverse transformation on the bipolar string –1 1 –1 –1, we get the binary string 0 1 0 0.

Bias

The use of threshold value can take two forms. One we showed in the example. The activation is compared to the threshold value, and the neuron fires if the threshold value is attained or exceeded. The other way is to add a value to the activation itself, in which case it is called the bias, and then determining the output of the neuron. We will encounter bias and gain later.

Another Example for the Hopfield Network

You will see in Chapter 12 an application of Kohonen’s feature map for pattern recognition. Here we give an example of pattern association using a Hopfield network. The patterns are some characters. A pattern representing a character becomes an input to a Hopfield network through a bipolar vector. This bipolar vector is generated from the pixel (picture element) grid for the character, with an assignment of a 1 to a black pixel and a -1 to a pixel that is white. A grid size such as 5x7 or higher is usually employed in these approaches. The number of pixels involved will then be 35 or more, which determines the dimension of a bipolar vector for the character pattern.

We will use, for simplicity, a 3x3 grid for character patterns in our example. This means the Hopfield network has 9 neurons in the only layer in the network. Again for simplicity, we use two exemplar patterns, or reference patterns, which are given in figure The “plus” pattern and “minus” pattern.. Consider the pattern on the left as a representation of the character “plus”, +, and the one on the right that of “minus”, - .


The “plus” pattern and “minus” pattern.

The bipolar vectors that represent the characters in the figure, reading the character pixel patterns row by row, left to right, and top to bottom, with a 1 for black and -1 for white pixels, are C+ = (-1, 1, -1, 1, 1, 1, -1, 1, -1), and C- = (-1, -1, -1, 1, 1, 1, -1, -1, -1). The weight matrix W is:

      0  0  2 -2 -2 -2  2  0  2
      0  0  0  0  0  0  0  2  0
      2  0  0 -2 -2 -2  2  0  2
      2  0 -2  0  2  2 -2  0 -2
W=    2  0 -2  2  0  2 -2  0 -2
      2  0 -2  2  2  0 -2  0 -2
      2  0  2 -2 -2 -2  0  0  2
      0  2  0  0  0  0  0  0  0
      2  0  2 -2 -2 -2  2  0  0

The activations with input C+ are given by the vector (-12, 2, -12, 12, 12, 12, -12, 2, -12). With input C-, the activations vector is (-12, -2, -12, 12, 12, 12, -12, -2, -12).

When this Hopfield network uses the threshold function

             1  if x >= 0
f(x) = {
            -1  if x [le] 0

the corresponding outputs will be C+ and C-, respectively, showing the stable recall of the exemplar vectors, and establishing an autoassociation for them. When the output vectors are used to construct the corresponding characters, you get the original character patterns.

Let us now input the character pattern in figure:


Corrupted “minus” pattern.

We will call the corresponding bipolar vector A = (1, -1, -1, 1, 1, 1, -1, -1, -1). You get the activation vector (-12, -2, -8, 4, 4, 4, -8, -2, -8) giving the output vector, C- = (-1, -1, -1, 1, 1, 1, -1, -1, -1). In other words, the character -, corrupted slightly, is recalled as the character - by the Hopfield network. The intended pattern is recognized.

We now input a bipolar vector that is different from the vectors corresponding to the exemplars, and see whether the network can store the corresponding pattern. The vector we choose is B = (1, -1, 1, -1, -1, -1, 1, -1, 1). The corresponding neuron activations are given by the vector (12, -2, 12, -4, -4, -4, 12, -2, 12) which causes the output to be the vector (1, -1, 1, -1, -1, -1, 1, -1, 1), same as B. An additional pattern, which is a 3x3 grid with only the corner pixels black, as shown in figure:


Pattern result.

is also recalled since it is autoassociated, by this Hopfield network. If we omit part of the pattern, leaving only the top corners black, we get the bipolar vector D = (1, -1, 1, -1, -1, -1, -1, -1, -1). The network activations turn out to be (4, -2, 4, -4, -4, -4, 8, -2, 8) and give the output (1, -1, 1, -1, -1, -1, 1, -1, 1), which is B.


A partly lost Pattern

Summary

In this chapter we introduced a neural network as a collection of processing elements distributed over a finite number of layers and interconnected with positive or negative weights, depending on whether cooperation or competition (or inhibition) is intended. The activation of a neuron is basically a weighted sum of its inputs. A threshold function determines the output of the network. There may be layers of neurons in between the input layer and the output layer, and some such middle layers are referred to as hidden layers, others by names such as Grossberg or Kohonen layers, named after the researchers Stephen Grossberg and Teuvo Kohonen, who proposed them and their function. Modification of the weights is the process of training the network, and a network subject to this process is said to be learning during that phase of the operation of the network. In some network operations, a feedback operation is used in which the current output is treated as modified input to the same network.

You have seen a couple of examples of a Hopfield network, one of them for pattern recognition.

Neural networks can be used for problems that can’t be solved with a known formula and for problems with incomplete or noisy data. Neural networks seem to have the capacity to recognize patterns in the data presented to it, and are thus useful in many types of pattern recognition problems.