Constructing a Neural Network

First Example for C++ Implementation

The neural network we presented in Chapter Introduction to Neural Networks is an example of a Hopfield network with a single layer. Now we present a C++ implementation of this network. Suppose we place four neurons, all connected to one another on this layer, as shown in Figure Layout of a Hopfield Network. Some of these connections have a positive weight and the rest have a negative weight. You may recall from the earlier presentation of this example, that we used two input patterns to determine the weight matrix. The network recalls them when the inputs are presented to the network, one at a time. These inputs are binary and orthogonal so that their stable recall is assured. Each component of a binary input pattern is either a 0 or a 1. Two vectors are orthogonal when their dot product—the sum of the products of their corresponding components—is zero. An example of a binary input pattern is 1 0 1 0 0. An example of a pair of orthogonal vectors is (0, 1, 0, 0, 1) and (1, 0, 0, 1, 0). An example of a pair of vectors that are not orthogonal is (0, 1, 0, 0, 1) and (1, 1, 0, 1, 0). These last two vectors have a dot product of 1, different from 0.

Layout of a Hopfield Network

The two patterns we want the network to have stable recall for are A = (1, 0, 1, 0) and B = (0, 1, 0, 1). The weight matrix W is given as follows:

               0    -3     3     -3
     W     =  -3     0    -3      3
               3    -3     0     -3
              -3     3    -3      0

NOTE:  The positive links (values with positive signs) tend to encourage agreement in a stable configuration, whereas negative links (values with negative signs) tend to discourage agreement in a stable configuration.

We need a threshold function also, and we define it using a threshold value, [theta], as follows:

                  1  if t >= [theta]
     f(t) =  {
                  0  if t < [theta]

The threshold value [theta] is used as a cut-off value for the activation of a neuron to enable it to fire. The activation should equal or exceed the threshold value for the neuron to fire, meaning to have output 1. For our Hopfield network, [theta] is taken as 0. There are four neurons in the only layer in this network. The first node’s output is the output of the threshold function. The argument for the threshold function is the activation of the node. And the activation of the node is the dot product of the input vector and the first column of the weight matrix. So if the input vector is A, the dot product becomes 3, and f(3) = 1. And the dot products of the second, third, and fourth nodes become –6, 3, and –6, respectively. The corresponding outputs therefore are 0, 1, and 0. This means that the output of the network is the vector (1, 0, 1, 0), which is the same as the input pattern. Therefore, the network has recalled the pattern as presented. When B is presented, the dot product obtained at the first node is –6 and the output is 0. The activations of all the four nodes together with the threshold function give (0, 1, 0, 1) as output from the network, which means that the network recalled B as well. The weight matrix worked well with both input patterns, and we do not need to modify it.

Classes in C++ Implementation

In our C++ implementation of this network, there are the following classes: a network class, and a neuron class. In our implementation, we create the network with four neurons, and these four neurons are all connected to one another. A neuron is not self-connected, though. That is, there is no edge in the directed graph representing the network, where the edge is from one node to itself. But for simplicity, we could pretend that such a connection exists carrying a weight of 0, so that the weight matrix has 0’s in its principal diagonal.

The functions that determine the neuron activations and the network output are declared public. Therefore they are visible and accessible without restriction. The activations of the neurons are calculated with functions defined in the neuron class. When there are more than one layer in a neural network, the outputs of neurons in one layer become the inputs for neurons in the next layer. In order to facilitate passing the outputs from one layer as inputs to another layer, our C++ implementations compute the neuron outputs in the network class. For this reason the threshold function is made a member of the network class. We do this for the Hopfield network as well. To see if the network has achieved correct recall, you make comparisons between the presented pattern and the network output, component by

C++ Program for a Hopfield Network

For convenience every C++ program has two components: One is the header file with all of the class declarations and lists of include library files; the other is the source file that includes the header file and the detailed descriptions of the member functions of the classes declared in the header file. You also put the function main in the source file. Most of the computations are done by class member functions, when class objects are created in the function main, and calls are made to the appropriate functions. The header file has an .h (or .hpp) extension, as you know, and the source file has a .cpp extension, to indicate that it is a C++ code file. It is possible to have the contents of the header file written at the beginning of the .cpp file and work with one file only, but separating the declarations and implementations into two files allows you to change the implementation of a class(.cpp) without changing the interface to the class (.h).

Header File for C++ Program for Hopfield Network

Listing 4.1 contains Hop.h, the header file for the C++ program for the Hopfield network. The include files listed in it are the stdio.h, iostream.h, and math.h. The iostream.h file contains the declarations and details of the C++ streams for input and output. A network class and a neuron class, are declared in Hop.h. The data members and member functions are declared within each class, and their accessibility is specified by the keywords protected or public.

Listing 4.1 Header file for C++ program for Hopfield network.

//Hop.h      V. Rao, H. Rao
//Single layer Hopfield Network with 4 neurons
 
#include <stdio.h>
#include <iostream.h>
#include <math.h>
 
class neuron {
protected:
     int activation;
     friend class network;
public:
     int weightv[4];
     neuron() {};
     neuron(int *j) ;
     int act(int, int*);
};
 
class network {
public:
     neuron   nrn[4];
     int output[4];
     int threshld(int) ;
     void activation(int j[4]);
     network(int*,int*,int*,int*);
};

Notes on the Header File Hop.h

Notice that the data item activation in the neuron class is declared as protected. In order to make the member activation of the neuron class accessible to the network class, the network is declared a friend class in the class neuron. Also, there are two constructors for the class neuron. One of them creates the object neuron without initializing any data members. The other creates the object neuron and initializes the connection weights.

Source Code for the Hopfield Network

Listing 4.2 contains the source code for the C++ program for a Hopfield network in the file Hop.cpp. The member functions of the classes declared in Hop.h are implemented here. The function main contains the input patterns, values to initialize the weight matrix, and calls to the constructor of network class and other member functions of the network class.

Listing 4.2 Source code for C++ program for Hopfield network.

//Hop.cpp   V. Rao, H. Rao
//Single layer Hopfield Network with 4 neurons
 
#include "hop.h"
 
neuron::neuron(int *j) {
int i;
for(i=0;i<4;i++) {
     weightv[i]= *(j+i);
     }
}
 
int neuron::act(int m, int *x) {
int i;
int a=0;
 
for(i=0;i<m;i++) {
     a += x[i]*weightv[i];
     }
return a;
}
 
int network::threshld(int k) {
if(k>=0)
     return (1);
else
     return (0);
}
 
network::network(int a[4],int b[4],int c[4],int d[4]) {
nrn[0] = neuron(a) ;
nrn[1] = neuron(b) ;
nrn[2] = neuron(c) ;
nrn[3] = neuron(d) ;
}
 
void network::activation(int *patrn){
int i,j;
for(i=0;i<4;i++)
     {
     for(j=0;j<4;j++)
          {
          cout<<"\n nrn["<<i<<"].weightv["<<j<<"] is " <<nrn[i].weightv[j];
          }
     nrn[i].activation = nrn[i].act(4,patrn);
     cout<<"\nactivation is "<<nrn[i].activation;
     output[i]=threshld(nrn[i].activation);
     cout<<"\noutput value is  "<<output[i]<<"\n";
     }
}
 
void main () {
int patrn1[]= {1,0,1,0},i;
int wt1[]= {0,-3,3,-3};
int wt2[]= {-3,0,-3,3};
int wt3[]= {3,-3,0,-3};
int wt4[]= {-3,3,-3,0};
cout<<"\nTHIS PROGRAM IS FOR A HOPFIELD NETWORK WITH A SINGLE LAYER OF";
cout<<"\n4 FULLY INTERCONNECTED NEURONS. THE NETWORK SHOULD RECALL THE";
cout<<"\nPATTERNS 1010 AND 0101 CORRECTLY.\n";
 
//create the network by calling its constructor.
// the constructor calls neuron constructor as many times as the number of
// neurons in the network.
network h1(wt1,wt2,wt3,wt4);
//present a pattern to the network and get the activations of the neurons
h1.activation(patrn1);
//check if the pattern given is correctly recalled and give message
for(i=0;i<4;i++)
     {
     if (h1.output[i] == patrn1[i])
          cout<<"\n pattern= "<<patrn1[i]<< "  output = "<<h1.output[i]<<"  component matches";
     else
          cout<<"\n pattern= "<<patrn1[i]<< "  output = "<<h1.output[i]<<"  discrepancy occurred";
     }
cout<<"\n\n";
int patrn2[]= {0,1,0,1};
h1.activation(patrn2);
for(i=0;i<4;i++) {
     if (h1.output[i] == patrn2[i])
          cout<<"\n pattern= "<<patrn2[i]<<"  output = "<<h1.output[i]<<"  component matches";
     else
          cout<<"\n pattern= "<<patrn2[i]<<"  output = "<<h1.output[i]<<"  discrepancy occurred";
       }
}

Comments on the C++ Program for Hopfield Network

Note the use of the output stream operator cout<< to output text strings or numerical output. C++ has istream and ostream classes from which the iostream class is derived. The standard input and output streams are cin and cout, respectively, used, correspondingly, with the operators >> and <<. Use of cout for the output stream is much simpler than the use of the C function printf. As you can see, there is no formatting suggested for output. However, there is a provision that allows you to format the output, while using cout.

Also note the way comments are introduced in the program. The line with comments should start with a double slash //. Unlike C, the comment does not have to end with a double slash. Of course, if the comments extend to subsequent lines, each such line should have a double slash at the start. You can still use the pair, /* at the beginning with */ at the end of lines of comments, as you do in C. If the comment continues through many lines, the C facility will be handier to delimit the comments.

The neurons in the network are members of the network class and are identified by the abbreviation nrn. The two patterns, 1010 and 0101, are presented to the network one at a time in the program.

Output from the C++ Program for Hopfield Network

The output from this program is as follows and is self-explanatory. When you run this program, you’re likely to see a lot of output whiz by, so in order to leisurely look at the output, use redirection. Type Hop > filename, and your output will be stored in a file, which you can edit with any text editor or list by using the type filename | more command.

THIS PROGRAM IS FOR A HOPFIELD NETWORK WITH A SINGLE LAYER OF 4 FULLY
INTERCONNECTED NEURONS. THE NETWORK SHOULD RECALL THE PATTERNS 1010 AND
0101 CORRECTLY.
 
 nrn[0].weightv[0] is  0
 nrn[0].weightv[1] is  -3
 nrn[0].weightv[2] is  3
 nrn[0].weightv[3] is  -3
activation is 3
output value is  1
 
 nrn[1].weightv[0] is  -3
 nrn[1].weightv[1] is  0
 nrn[1].weightv[2] is  -3
 nrn[1].weightv[3] is  3
activation is -6
output value is  0
 
 nrn[2].weightv[0] is  3
 nrn[2].weightv[1] is  -3
 nrn[2].weightv[2] is  0
 nrn[2].weightv[3] is  -3
activation is 3
output value is  1
 
 nrn[3].weightv[0] is  -3
 nrn[3].weightv[1] is  3
 nrn[3].weightv[2] is  -3
 nrn[3].weightv[3] is  0
activation is -6
output value is  0
 
 pattern= 1  output = 1  component matches
 pattern= 0  output = 0  component matches
 pattern= 1  output = 1  component matches
 pattern= 0  output = 0  component matches
 
 nrn[0].weightv[0] is  0
 nrn[0].weightv[1] is  -3
 nrn[0].weightv[2] is  3
 nrn[0].weightv[3] is  -3
activation is -6
output value is  0
 
 nrn[1].weightv[0] is  -3
 nrn[1].weightv[1] is  0
 nrn[1].weightv[2] is  -3
 nrn[1].weightv[3] is  3
activation is 3
output value is  1
 
 nrn[2].weightv[0] is  3
 nrn[2].weightv[1] is  -3
 nrn[2].weightv[2] is  0
 nrn[2].weightv[3] is  -3
activation is -6
output value is  0
 
 nrn[3].weightv[0] is  -3
 nrn[3].weightv[1] is  3
 nrn[3].weightv[2] is  -3
 nrn[3].weightv[3] is  0
activation is 3
output value is  1
 
 pattern= 0  output = 0  component matches
 pattern= 1  output = 1  component matches
 pattern= 0  output = 0  component matches
 pattern= 1  output = 1  component matches

Further Comments on the Program and Its Output

Let us recall our previous discussion of this example in Chapter 1. What does the network give as output if we present a pattern different from both A and B? If C = (0, 1, 0, 0) is the input pattern, the activation (dot products) would be –3, 0, –3, 3 making the outputs (next state) of the neurons 0,1,0,1, so that B would be recalled. This is quite interesting, because if we intended to input B, and we made a slight error and ended up presenting C instead, the network would recall B. You can run the program by changing the pattern to 0, 1, 0, 0 and compiling again, to see that the B pattern is recalled.

Another element about the example in Chapter 1 is that the weight matrix W is not the only weight matrix that would enable the network to recall the patterns A and B correctly. If we replace the 3 and –3 in the matrix with 2 and –2, respectively, the resulting matrix would facilitate the same performance from the network. One way for you to check this is to change the wt1, wt2, wt3, wt4 given in the program accordingly, and compile and run the program again. The reason why both of the weight matrices work is that they are closely related. In fact, one is a scalar (constant) multiple of the other, that is, if you multiply each element in the matrix by the same scalar, namely 2/3, you get the corresponding matrix in cases where 3 and –3 are replaced with 2 and –2, respectively.

A New Weight Matrix to Recall More Patterns

Let’s continue to discuss this example. Suppose we are interested in having the patterns E = (1, 0, 0, 1) and F = (0, 1, 1, 0) also recalled correctly, in addition to the patterns A and B. In this case we would need to train the network and come up with a learning algorithm, which we will discuss in more detail later in the book. We come up with the matrix W1, which follows.

             0    -5     4    4
     W1 =   -5     0     4    4
             4     4     0   -5
             4     4    -5    0

Try to use this modification of the weight matrix in the source program, and then compile and run the program to see that the network successfully recalls all four patterns A, B, E, and F.


NOTE:  The C++ implementation shown does not include the asynchronous update feature mentioned in Chapter 1, which is not necessary for the patterns presented. The coding of this feature is left as an exercise for the reader.


Weight Determination

You may be wondering about how these weight matrices were developed in the previous example, since so far we’ve only discussed how the network does its job, and how to implement the model. You have learned that the choice of weight matrix is not necessarily unique. But you want to be assured that there is some established way besides trial and error, in which to construct a weight matrix. You can go about this in the following way.

Binary to Bipolar Mapping

Let’s look at the previous example. You have seen that by replacing each 0 in a binary string with a –1, you get the corresponding bipolar string. If you keep all 1’s the same and replace each 0 with a –1, you will have a formula for the above option. You can apply the following function to each bit in the string:

     f(x) = 2x – 1

NOTE:  When you give the binary bit x, you get the corresponding bipolar character f(x)


For inverse mapping, which turns a bipolar string into a binary string, you use the following function:

     f(x) =  (x + 1) / 2

NOTE:  When you give the bipolar character x, you get the corresponding binary bit f(x)


Pattern’s Contribution to Weight

Next, we work with the bipolar versions of the input patterns. You take each pattern to be recalled, one at a time, and determine its contribution to the weight matrix of the network. The contribution of each pattern is itself a matrix. The size of such a matrix is the same as the weight matrix of the network. Then add these contributions, in the way matrices are added, and you end up with the weight matrix for the network, which is also referred to as the correlation matrix. Let us find the contribution of the pattern A = (1, 0, 1, 0):

First, we notice that the binary to bipolar mapping of A = (1, 0, 1, 0) gives the vector (1, –1, 1, –1).

Then we take the transpose, and multiply, the way matrices are multiplied, and we see the following:

     1  [1   -1   1   -1]       1   -1   1   -1
     1                     =   -1    1  -1    1
     1                          1   -1   1   -1
     1                         -1    1  -1    1

Now subtract 1 from each element in the main diagonal (that runs from top left to bottom right). This operation gives the same result as subtracting the identity matrix from the given matrix, obtaining 0’s in the main diagonal. The resulting matrix, which is given next, is the contribution of the pattern (1, 0, 1, 0) to the weight matrix.

      0      -1      1     -1
     -1       0     -1      1
      1      -1      0     -1
     -1       1     -1      0

Similarly, we can calculate the contribution from the pattern B = (0, 1, 0, 1) by verifying that pattern B’s contribution is the same matrix as pattern A’s contribution. Therefore, the matrix of weights for this exercise is the matrix W shown here.

           0     -2      2      -2
  W  =    -2      0     -2       2
           2     -2      0      -2
          -2      2     -2       0

You can now optionally apply an arbitrary scalar multiplier to all the entries of the matrix if you wish. This is how we had previously obtained the +/- 3 values instead of +/- 2 values shown above.

Autoassociative Network

The Hopfield network just shown has the feature that the network associates an input pattern with itself in recall. This makes the network an autoassociative network. The patterns used for determining the proper weight matrix are also the ones that are autoassociatively recalled. These patterns are called the exemplars. A pattern other than an exemplar may or may not be recalled by the network. Of course, when you present the pattern 0 0 0 0, it is stable, even though it is not an exemplar pattern.

Orthogonal Bit Patterns

You may be wondering how many patterns the network with four nodes is able to recall. Let us first consider how many different bit patterns are orthogonal to a given bit pattern. This question really refers to bit patterns in which at least one bit is equal to 1. A little reflection tells us that if two bit patterns are to be orthogonal, they cannot both have 1’s in the same position, since the dot product would need to be 0. In other words, a bitwise logical AND operation of the two bit patterns has to result in a 0. This suggests the following. If a pattern P has k, less than 4, bit positions with 0 (and so 4-k bit positions with 1), and if pattern Q is to be orthogonal to P, then Q can have 0 or 1 in those k positions, but it must have only 0 in the rest 4-k positions. Since there are two choices for each of the k positions, there are 2k possible patterns orthogonal to P. This number 2k of patterns includes the pattern with all zeroes. So there really are 2k–1 non-zero patterns orthogonal to P. Some of these 2k–1 patterns are not orthogonal to each other. As an example, P can be the pattern 0 1 0 0, which has k = 3 positions with 0. There are 23–1=7 nonzero patterns orthogonal to 0 1 0 0. Among these are patterns 1 0 1 0 and 1 0 0 1, which are not orthogonal to each other, since their dot product is 1 and not 0.

Network Nodes and Input Patterns

Since our network has four neurons in it, it also has four nodes in the directed graph that represents the network. These are laterally connected because connections are established from node to node. They are lateral because the nodes are all in the same layer. We started with the patterns A = (1, 0, 1, 0) and B = (0, 1, 0, 1) as the exemplars. If we take any other nonzero pattern that is orthogonal to A, it will have a 1 in a position where B also has a 1. So the new pattern will not be orthogonal to B. Therefore, the orthogonal set of patterns that contains A and B can have only those two as its elements. If you remove B from the set, you can get (at most) two others to join A to form an orthogonal set. They are the patterns (0, 1, 0, 0) and (0, 0, 0, 1).

If you follow the procedure described earlier to get the correlation matrix, you will get the following weight matrix:

             0     -1      3      -1
     W  =   -1      0     -1      -1
             3     -1      0      -1
            -1     -1     -1       0

With this matrix, pattern A is recalled, but the zero pattern (0, 0, 0, 0) is obtained for the two patterns (0, 1, 0, 0) and (0, 0, 0, 1). Once the zero pattern is obtained, its own recall will be stable.

Second Example for C++ Implementation

Recall the cash register game from the show The Price is Right, used as one of the examples in Chapter 1. This example led to the description of the Perceptron neural network. We will now resume our discussion of the Perceptron model and follow up with its C++ implementation. Keep the cash register game example in mind as you read the following C++ implementation of the Perceptron model. Also note that the input signals in this example are not necessarily binary, but they may be real numbers. It is because the prices of the items the contestant has to choose are real numbers (dollars and cents). A Perceptron has one layer of input neurons and one layer of output neurons. Each input layer neuron is connected to each neuron in the output layer.

C++ Implementation of Perceptron Network

In our C++ implementation of this network, we have the following classes: we have separate classes for input neurons and output neurons. The ineuron class is for the input neurons. This class has weight and activation as data members. The oneuron class is similar and is for the output neuron. It is declared as a friend class in the ineuron class. The output neuron class has also a data member called output. There is a network class, which is a friend class in the oneuron class. An instance of the network class is created with four input neurons. These four neurons are all connected with one output neuron.

The member functions of the ineuron class are: (1) a default constructor, (2) a second constructor that takes a real number as an argument, and (3) a function that calculates the output of the input neuron. The constructor taking one argument uses that argument to set the value of the weight on the connection between the input neuron and the output neuron. The functions that determine the neuron activations and the network output are declared public. The activations of the neurons are calculated with functions defined in the neuron classes. A threshold value is used by a member function of the output neuron to determine if the neuron’s activation is large enough for it to fire, giving an output of 1.

Header File

The percept.h header file.

//percept.h        V. Rao, H. Rao
// Perceptron model
 
#include <stdio.h>
#include <iostream.h>
#include <math.h>
 
class ineuron {
protected:
     float weight;
     float activation;
     friend class oneuron;
public:
     ineuron() {};
     ineuron(float j) ;
     float act(float x);
};
 
class oneuron {
protected:
     int output;
     float activation;
     friend class network;
public:
     oneuron() { };
     void actvtion(float x[4], ineuron *nrn);
     int outvalue(float j) ;
};
 
class network {
public:
     ineuron   nrn[4];
     oneuron   onrn;
     network(float,float,float,float);
 
};

Implementation of Functions

The network is designed to have four neurons in the input layer. Each of them is an object of class ineuron, and these are member classes in the class network. There is one explicitly defined output neuron of the class oneuron. The network constructor also invokes the neuron constructor for each input layer neuron in the network by providing it with the initial weight for its connection to the neuron in the output layer. The constructor for the output neuron is also invoked by the network constructor, at the same time initializing the output and activation data members of the output neuron each to zero. To make sure there is access to needed information and functions, the output neuron is declared a friend class in the class ineuron. The network is declared as a friend class in the class oneuron.

Source Code for Perceptron Network

The listing contains the source code in percept.cpp for the C++ implementation of the Perceptron model previously discussed.

Source code for Perceptron model.

//percept.cpp   V. Rao, H. Rao
//Perceptron model
 
#include "percept.h"
#include "stdio.h"
#include "stdlib.h"
 
ineuron::ineuron(float j)
{
weight= j;
}
 
float ineuron::act(float x)
{
float a;
 
a = x*weight;
 
return a;
}
 
void oneuron::actvtion(float *inputv, ineuron *nrn)
{
int i;
activation = 0;
 
for(i=0;i<4;i++)
     {
     cout<<"\nweight for neuron "<<i+1<<" is       "<<nrn[i].weight;
     nrn[i].activation = nrn[i].act(inputv[i]);
     cout<<"           activation is      "<<nrn[i].activation;
     activation += nrn[i].activation;
     }
cout<<"\n\nactivation is  "<<activation<<"\n";
}
 
int oneuron::outvalue(float j)
{
if(activation>=j)
     {
     cout<<"\nthe output neuron activation \
exceeds the threshold value of "<<j<<"\n";
     output = 1;
     }
else
     {
     cout<<"\nthe output neuron activation \
is smaller than the threshold value of "<<j<<"\n";
     output = 0;
     }
 
cout<<" output value is "<< output;
return (output);
}
 
network::network(float a,float b,float c,float d)
{
nrn[0] = ineuron(a) ;
nrn[1] = ineuron(b) ;
nrn[2] = ineuron(c) ;
nrn[3] = ineuron(d) ;
onrn = oneuron();
onrn.activation = 0;
onrn.output = 0;
}
void main (int argc, char * argv[])
{
 
float inputv1[]= {1.95,0.27,0.69,1.25};
float wtv1[]= {2,3,3,2}, wtv2[]= {3,0,6,2};
FILE * wfile, * infile;
int num=0, vecnum=0, i;
float threshold = 7.0;
 
if (argc < 2)
     {
     cerr << "Usage: percept Weightfile Inputfile";
     exit(1);
     }
// open  files
 
wfile= fopen(argv[1], "r");
infile= fopen(argv[2], "r");
 
if ((wfile == NULL) || (infile == NULL))
     {
     cout << " Can't open a file\n";
     exit(1);
     }
 
cout<<"\nTHIS PROGRAM IS FOR A PERCEPTRON NETWORK WITH AN INPUT LAYER OF";
cout<<"\n4 NEURONS, EACH CONNECTED TO THE OUTPUT NEURON.\n";
cout<<"\nTHIS EXAMPLE TAKES REAL NUMBERS AS INPUT SIGNALS\n";
 
//create the network by calling its constructor.
//the constructor calls neuron constructor as many times as the number of
//neurons in input layer of the network.
 
cout<<"please enter the number of weights/vectors \n";
cin >> vecnum;
 
for (i=1;i<=vecnum;i++)
     {
     fscanf(wfile,"%f %f %f %f\n", &wtv1[0],&wtv1[1],&wtv1[2],&wtv1[3]);
     network h1(wtv1[0],wtv1[1],wtv1[2],wtv1[3]);
     fscanf(infile,"%f %f %f %f \n",
     &inputv1[0],&inputv1[1],&inputv1[2],&inputv1[3]);
     cout<<"this is vector # " << i << "\n";
     cout << "please enter a threshold value, eg 7.0\n";
     cin >> threshold;
     h1.onrn.actvtion(inputv1, h1.nrn);
     h1.onrn.outvalue(threshold);
     cout<<"\n\n";
     }
 
fclose(wfile);
fclose(infile);
}

Comments on Your C++ Program

Notice the use of input stream operator cin>> in the C++ program, instead of the C function scanf in several places. The iostream class in C++ was discussed earlier in this chapter. The program works like this:

First, the network input neurons are given their connection weights, and then an input vector is presented to the input layer. A threshold value is specified, and the output neuron does the weighted sum of its inputs, which are the outputs of the input layer neurons. This weighted sum is the activation of the output neuron, and it is compared with the threshold value, and the output neuron fires (output is 1) if the threshold value is not greater than its activation. It does not fire (output is 0) if its activation is smaller than the threshold value. In this implementation, neither supervised nor unsupervised training is incorporated.

Input/Output for percept.cpp

There are two data files used in this program. One is for setting up the weights, and the other for setting up the input vectors. On the command line, you enter the program name followed by the weight file name and the input file name. For this discussion (also on the accompanying disk for this book) create a file called weight.dat, which contains the following data:

  2.0 3.0 3.0 2.0
  3.0 0.0 6.0 2.0

These are two weight vectors. Create also an input file called input.dat with the two data vectors below:

  1.95 0.27 0.69 1.25
  0.30 1.05 0.75 0.19

During the execution of the program, you are first prompted for the number of vectors that are used (in this case, 2), then for a threshold value for the input/weight vectors (use 7.0 in both cases). You will then see the following output. Note that the user input is in italic.

  percept weight.dat input.dat
 
THIS PROGRAM IS FOR A PERCEPTRON NETWORK WITH AN INPUT LAYER OF 4
NEURONS, EACH CONNECTED TO THE OUTPUT NEURON.
 
THIS EXAMPLE TAKES REAL NUMBERS AS INPUT SIGNALS
please enter the number of weights/vectors
 
2
this is vector # 1
please enter a threshold value, eg 7.0
7.0
 
weight for neuron 1 is  2           activation is 3.9
weight for neuron 2 is  3           activation is 0.81
weight for neuron 3 is  3           activation is 2.07
weight for neuron 4 is  2           activation is 2.5
 
activation is  9.28
 
the output neuron activation exceeds the threshold value of 7
 output value is 1
 
this is vector # 2
please enter a threshold value, eg 7.0
7.0
 
weight for neuron 1 is  3           activation is 0.9
weight for neuron 2 is  0           activation is 0
weight for neuron 3 is  6           activation is 4.5
weight for neuron 4 is  2           activation is 0.38
 
activation is  5.78
the output neuron activation is smaller than the threshold value of 7
output value is 0

Finally, try adding a data vector of (1.4, 0.6, 0.35, 0.99) to the data file. Add a weight vector of ( 2, 6, 8, 3) to the weight file and use a threshold value of 8.25 to see the result. You can use other values to experiment also.

Network Modeling

So far, we have considered the construction of two networks, the Hopfield memory and the Perceptron. What are other considerations (which will be discussed in more depth in the chapters to follow) that you should keep in mind ?

Some of the considerations that go into the modeling of a neural network for an application are:

nature of inputs
 
       fuzzy
 
                 binary
                 analog
 
       crisp
 
                 binary
                 analog
 
number of inputs
 
nature of outputs
 
       fuzzy
 
                 binary
                 analog
 
       crisp
 
                 binary
                 analog
 
 
number of outputs
 
 
nature of the application
       to complete patterns (recognize corrupted patterns)
       to classify patterns
       to do an optimization
       to do approximation
       to perform data clustering
       to compute functions
 
 
dynamics
 
    adaptive
 
              learning
 
                           training
 
                                           with exemplars
                                           without exemplars
 
              self-organizing
 
    nonadaptive
 
              learning
 
                           training
 
                                           with exemplars
                                           without exemplars
 
              self-organizing
 
 
hidden layers
 
 
    number
 
              fixed
              variable
 
    sizes
 
              fixed
              variable
 
processing
 
       additive
       multiplicative
       hybrid
 
                 additive and multiplicative
                 combining other approaches
                              expert systems
                              genetic algorithms

Hybrid models, as indicated above, could be of the variety of combining neural network approach with expert system methods or of combining additive and multiplicative processing paradigms.

Decision support systems are amenable to approaches that combine neural networks with expert systems. An example of a hybrid model that combines different modes of processing by neurons is the Sigma Pi neural network, wherein one layer of neurons uses summation in aggregation and the next layer of neurons uses multiplicative processing.

A hidden layer, if only one, in a neural network is a layer of neurons that operates in between the input layer and the output layer of the network. Neurons in this layer receive inputs from those in the input layer and supply their outputs as the inputs to the neurons in the output layer. When a hidden layer comes in between other hidden layers, it receives input and supplies input to the respective hidden layers.

In modeling a network, it is often not easy to determine how many, if any, hidden layers, and of what sizes, are needed in the model. Some approaches, like genetic algorithms—which are paradigms competing with neural network approaches in many situations but nevertheless can be cooperative, as here—are at times used to make a determination on the needed or optimum, as the case may be, numbers of hidden layers and/or the neurons in those hidden layers. In what follows, we outline one such application.

Tic-Tac-Toe Anyone?

David Fogel describes evolutionary general problem solving and uses the familiar game of Tic-Tac-Toe as an example. The idea is to come up with optimal strategies in playing this game. The first player’s marker is an X, and the second player’s marker is an O. Whoever gets three of his or her markers in a row or a column or a diagonal before the other player does, wins. Shrewd players manage a draw position, if their equally shrewd opponent thwarts their attempts to win. A draw position is one where neither player has three of his or her markers in a row, or a column, or a diagonal.

The board can be described by a vector of nine components, each of which is a three-valued number. Imagine the squares of the board for the game as taken in sequence row by row from top to bottom. Allow a 1 to show the presence of an X in that square, a 0 to indicate a blank there, and a -1 to correspond to an O. This is an example of a coding for the status of the board. For example, (-1, 0, 1, 0, -1, 0, 1, 1, -1) is a winning position for the second player, because it corresponds to the board looking as below.

 O       X
     O
 X   X   O

A neural network for this problem will have an input layer with nine neurons, as each input pattern has nine components. There would be some hidden layers. But the example is with one hidden layer. The output layer also contains nine neurons, so that one cycle of operation of the network shows what the best configuration of the board is to be, given a particular input. Of course, during this cycle of operation, all that needs to be determined is which blank space, indicated by a 0 in the input, should be changed to 1, if strategy is being worked for player 1. None of the 1’s and -1’s is to be changed.

In this particular example, the neural network architecture itself is dynamic. The network expands or contracts according to some rules, which are described next.

Fogel describes the network as an evolving network in the sense that the number of neurons in the hidden layer changed with a probability of 0.5. A node was equally likely to be added or deleted. Since the number of unmarked squares dwindles after each play, this kind of approach with varying numbers of neurons in the network seems to be reasonable, and interesting.

The initial set of weights is random values between -0.5 and 0.5, inclusive, according to a uniform distribution. Bias and threshold values also come from this distribution. The sigmoid function:

1/(1 + e-x)

is used also, to determine the outputs.

Weights and biases were changed during the network operation training cycles. Thus, the network had a learning phase. (You will read more on learning in Chapter 6.) This network is adaptive, since it changes its architecture. Other forms of adaptation in neural networks are in changing parameter values for a fixed architecture. (See Chapter 6.) The results of the experiment Fogel describes show that you need nine neurons in the hidden layer also, for your network to be the best for this problem. They also purged any strategy that was likely to lose.

Fogel’s emphasis is on the evolutionary aspect of an adaptive process or experiment. Our interest in this example is primarily due to the fact that an adaptive neural network is used.

The choice of Tic-Tac-Toe, while being a simple and all too familiar game, is in the genre of much more complicated games. These games ask a player to place a marker in some position in a given array, and as players take turns doing so, some criterion determines if it is a draw, or who won. Unlike in Tic-Tac-Toe, the criterion by which one wins may not be known to the players.

Stability and Plasticity

We discuss now a few other considerations in neural network modeling by introducing short-term memory and long-term memory concepts. Neural network training is usually done in an iterative way, meaning that the procedure is repeated a certain number of times. These iterations are referred to as cycles. After each cycle, the input used may remain the same or change, or the weights may remain the same or change. Such change is based on the output of a completed cycle. If the number of cycles is not preset, and the network is allowed to go through cycles until some other criterion is met, the question of whether or not the termination of the iterative process occurs eventually, arises naturally.

Stability for a Neural Network

Stability refers to such convergence that facilitates an end to the iterative process. For example, if any two consecutive cycles result in the same output for the network, then there may be no need to do more iterations. In this case, convergence has occurred, and the network has stabilized in its operation. If weights are being modified after each cycle, then convergence of weights would constitute stability for the network.

In some situations, it takes many more iterations than you desire, to have output in two consecutive cycles to be the same. Then a tolerance level on the convergence criterion can be used. With a tolerance level, you accomplish early but satisfactory termination of the operation of the network.

Plasticity for a Neural Network

Suppose a network is trained to learn some patterns, and in this process the weights are adjusted according to an algorithm. After learning these patterns and encountering a new pattern, the network may modify the weights in order to learn the new pattern. But what if the new weight structure is not responsive to the new pattern? Then the network does not possess plasticity—the ability to deal satisfactorily with new short-term memory (STM) while retaining long-term memory (LTM). Attempts to endow a network with plasticity may have some adverse effects on the stability of your network.

Short-Term Memory and Long-Term Memory

We alluded to short-term memory (STM) and long-term memory (LTM) in the previous paragraph. STM is basically the information that is currently and perhaps temporarily being processed. It is manifested in the patterns that the network encounters. LTM, on the other hand, is information that is already stored and is not being currently processed. In a neural network, STM is usually characterized by patterns and LTM is characterized by the connections’ weights. The weights determine how an input is processed in the network to yield output. During the cycles of operation of a network, the weights may change. After convergence, they represent LTM, as the weight levels achieved are stable.

Summary

You saw in this chapter, the C++ implementations of a simple Hopfield network and of a simple Perceptron network. What have not been included in them is an automatic iteration and a learning algorithm. They were not necessary for the examples that were used in this chapter to show C++ implementation, the emphasis was on the method of implementation. In a later chapter, you will read about the learning algorithms and examples of how to implement some of them.

Considerations in modeling a neural network are presented in this chapter along with an outline of how Tic-Tac-Toe is used as an example of an adaptive neural network model.

You also were introduced to the following concepts: stability, plasticity, short-term memory, and long-term memory (discussed further in later chapters). Much more can be said about them, in terms of the so-called noise-saturation dilemma, or stability–plasticity dilemma and what research has developed to address them (for further reading, see References).