Backpropagation

Feedforward Backpropagation Network

The feedforward backpropagation network is a very popular model in neural networks. It does not have feedback connections, but errors are backpropagated during training. Least mean squared error is used. Many applications can be formulated for using a feedforward backpropagation network, and the methodology has been a model for most multilayer neural networks. Errors in the output determine measures of hidden layer output errors, which are used as a basis for adjustment of connection weights between the input and hidden layers. Adjusting the two sets of weights between the pairs of layers and recalculating the outputs is an iterative process that is carried on until the errors fall below a tolerance level. Learning rate parameters scale the adjustments to weights. A momentum parameter can also be used in scaling the adjustments from a previous iteration and adding to the adjustments in the current iteration.

Mapping

The feedforward backpropagation network maps the input vectors to output vectors. Pairs of input and output vectors are chosen to train the network first. Once training is completed, the weights are set and the network can be used to find outputs for new inputs. The dimension of the input vector determines the number of neurons in the input layer, and the number of neurons in the output layer is determined by the dimension of the outputs. If there are k neurons in the input layer and m neurons in the output layer, then this network can make a mapping from k-dimensional space to an m-dimensional space. Of course, what that mapping is depends on what pair of patterns or vectors are used as exemplars to train the network, which determine the network weights. Once trained, the network gives you the image of a new input vector under this mapping. Knowing what mapping you want the feedforward backpropagation network to be trained for implies the dimensions of the input space and the output space, so that you can determine the numbers of neurons to have in the input and output layers.

Layout

Layout of a feedforward backpropagation network.

The network has three fields of neurons: one for input neurons, one for hidden processing elements, and one for the output neurons. As already stated, connections are for feed forward activity. There are connections from every neuron in field A to every one in field B, and, in turn, from every neuron in field B to every neuron in field C. Thus, there are two sets of weights, those figuring in the activations of hidden layer neurons, and those that help determine the output neuron activations. In training, all of these weights are adjusted by considering what can be called a cost function in terms of the error in the computed output pattern and the desired output pattern.

Training

The feedforward backpropagation network undergoes supervised training, with a finite number of pattern pairs consisting of an input pattern and a desired or target output pattern. An input pattern is presented at the input layer. The neurons here pass the pattern activations to the next layer neurons, which are in a hidden layer. The outputs of the hidden layer neurons are obtained by using perhaps a bias, and also a threshold function with the activations determined by the weights and the inputs. These hidden layer outputs become inputs to the output neurons, which process the inputs using an optional bias and a threshold function. The final output of the network is determined by the activations from the output layer.

The computed pattern and the input pattern are compared, a function of this error for each component of the pattern is determined, and adjustment to weights of connections between the hidden layer and the output layer is computed. A similar computation, still based on the error in the output, is made for the connection weights between the input and hidden layers. The procedure is repeated with each pattern pair assigned for training the network. Each pass through all the training patterns is called a cycle or an epoch. The process is then repeated as many cycles as needed until the error is within a prescribed tolerance.


There can be more than one learning rate parameter used in training in a feedforward backpropagation network. You can use one with each set of weights between consecutive layers.


Illustration: Adjustment of Weights of Connections from a Neuron in the Hidden Layer

We will be as specific as is needed to make the computations clear. First recall that the activation of a neuron in a layer other than the input layer is the sum of products of its inputs and the weights corresponding to the connections that bring in those inputs. Let us discuss the jth neuron in the hidden layer. Let us be specific and say j = 2. Suppose that the input pattern is (1.1, 2.4, 3.2, 5.1, 3.9) and the target output pattern is (0.52, 0.25, 0.75, 0.97). Let the weights be given for the second hidden layer neuron by the vector (–0.33, 0.07, –0.45, 0.13, 0.37). The activation will be the quantity:

     (-0.33 * 1.1) + (0.07 * 2.4) + (-0.45 * 3.2) + (0.13 * 5.1)
     + (0.37 * 3.9) = 0.471

Now add to this an optional bias of, say, 0.679, to give 1.15. If we use the sigmoid function given by:

     1 / ( 1+ exp(-x) ),

with x = 1.15, we get the output of this hidden layer neuron as 0.7595.


We are taking values to a few decimal places only for illustration, unlike the precision that can be obtained on a computer.


We need the computed output pattern also. Let us say it turns out to be actual =(0.61, 0.41, 0.57, 0.53), while the desired pattern is desired =(0.52, 0.25, 0.75, 0.97). Obviously, there is a discrepancy between what is desired and what is computed. The component-wise differences are given in the vector, desired - actual = (-0.09, -0.16, 0.18, 0.44). We use these to form another vector where each component is a product of the error component, corresponding computed pattern component, and the complement of the latter with respect to 1. For example, for the first component, error is –0.09, computed pattern component is 0.61, and its complement is 0.39. Multiplying these together (0.61*0.39*-0.09), we get -0.02. Calculating the other components similarly, we get the vector (–0.02, –0.04, 0.04, 0.11). The desired–actual vector, which is the error vector multiplied by the actual output vector, gives you a value of error reflected back at the output of the hidden layer. This is scaled by a value of (1-output vector), which is the first derivative of the output activation function for numerical stability). You will see the formulas for this process later in this chapter.

The backpropagation of errors needs to be carried further. We need now the weights on the connections between the second neuron in the hidden layer that we are concentrating on, and the different output neurons. Let us say these weights are given by the vector (0.85, 0.62, –0.10, 0.21). The error of the second neuron in the hidden layer is now calculated as below, using its output.

     error = 0.7595 * (1 - 0.7595) * ( (0.85 * -0.02) + (0.62 * -0.04)
     + ( -0.10 * 0.04) + (0.21 * 0.11)) = -0.0041.

Again, here we multiply the error (e.g., -0.02) from the output of the current layer, by the output value (0.7595) and the value (1-0.7595). We use the weights on the connections between neurons to work backwards through the network.

Next, we need the learning rate parameter for this layer; let us set it as 0.2. We multiply this by the output of the second neuron in the hidden layer, to get 0.1519. Each of the components of the vector (–0.02, –0.04, 0.04, 0.11) is multiplied now by 0.1519, which our latest computation gave. The result is a vector that gives the adjustments to the weights on the connections that go from the second neuron in the hidden layer to the output neurons. These values are given in the vector (–0.003, –0.006, 0.006, 0.017). After these adjustments are added, the weights to be used in the next cycle on the connections between the second neuron in the hidden layer and the output neurons become those in the vector (0.847, 0.614, –0.094, 0.227).

Illustration: Adjustment of Weights of Connections from a Neuron in the Input Layer

Let us look at how adjustments are calculated for the weights on connections going from the ith neuron in the input layer to neurons in the hidden layer. Let us take specifically i = 3, for illustration.

Much of the information we need is already obtained in the previous discussion for the second hidden layer neuron. We have the errors in the computed output as the vector (–0.09, –0.16, 0.18, 0.44), and we obtained the error for the second neuron in the hidden layer as –0.0041, which was not used above. Just as the error in the output is propagated back to assign errors for the neurons in the hidden layer, those errors can be propagated to the input layer neurons.

To determine the adjustments for the weights on connections between the input and hidden layers, we need the errors determined for the outputs of hidden layer neurons, a learning rate parameter, and the activations of the input neurons, which are just the input values for the input layer. Let us take the learning rate parameter to be 0.15. Then the weight adjustments for the connections from the third input neuron to the hidden layer neurons are obtained by multiplying the particular hidden layer neuron’s output error by the learning rate parameter and by the input component from the input neuron. The adjustment for the weight on the connection from the third input neuron to the second hidden layer neuron is 0.15 * 3.2 * –0.0041, which works out to –0.002.

If the weight on this connection is, say, –0.45, then adding the adjustment of -0.002, we get the modified weight of –0.452, to be used in the next iteration of the network operation. Similar calculations are made to modify all other weights as well.

Adjustments to Threshold Values or Biases

The bias or the threshold value we added to the activation, before applying the threshold function to get the output of a neuron, will also be adjusted based on the error being propagated back. The needed values for this are in the previous discussion.

The adjustment for the threshold value of a neuron in the output layer is obtained by multiplying the calculated error (not just the difference) in the output at the output neuron and the learning rate parameter used in the adjustment calculation for weights at this layer. In our previous example, we have the learning rate parameter as 0.2, and the error vector as (–0.02, –0.04, 0.04, 0.11), so the adjustments to the threshold values of the four output neurons are given by the vector (–0.004, –0.008, 0.008, 0.022). These adjustments are added to the current levels of threshold values at the output neurons.

The adjustment to the threshold value of a neuron in the hidden layer is obtained similarly by multiplying the learning rate with the computed error in the output of the hidden layer neuron. Therefore, for the second neuron in the hidden layer, the adjustment to its threshold value is calculated as 0.15 * –0.0041, which is –0.0006. Add this to the current threshold value of 0.679 to get 0.6784, which is to be used for this neuron in the next training pattern for the neural network.

Another Example of Backpropagation Calculations

You have seen, in the preceding sections, the details of calculations for one particular neuron in the hidden layer in a feedforward backpropagation network with five input neurons and four neurons in the output layer, and two neurons in the hidden layer.

You are going to see all the calculations in the C++ implementation later in this chapter. Right now, though, we present another example and give the complete picture of the calculations done in one completed iteration or cycle of backpropagation.

Consider a feedforward backpropagation network with three input neurons, two neurons in the hidden layer, and three output neurons. The weights on connections from the input neurons to the neurons in the hidden layer are given in Matrix M-1, and those from the neurons in the hidden layer to output neurons are given in Matrix M-2.

We calculate the output of each neuron in the hidden and output layers as follows. We add a bias or threshold value to the activation of a neuron (call this result x) and use the sigmoid function below to get the output.

     f(x) = 1/ (1 + e-x )

Learning parameters used are 0.2 for the connections between the hidden layer neurons and output neurons and 0.15 for the connections between the input neurons and the neurons in the hidden layer. These values as you recall are the same as in the previous illustration, to make it easy for you to follow the calculations by comparing them with similar calculations in the preceding sections.

The input pattern is ( 0.52, 0.75, 0.97 ), and the desired output pattern is ( 0.24, 0.17, 0.65). The initial weight matrices are as follows:

M-1 Matrix of weights from input layer to hidden layer

       0.6     - 0.4
       0.2       0.8
     - 0.5       0.3

M-2 Matrix of weights from hidden layer to output layer

     -0.90       0.43       0.25
      0.11     - 0.67     - 0.75

The threshold values (or bias) for neurons in the hidden layer are 0.2 and 0.3, while those for the output neurons are 0.15, 0.25, and 0.05, respectively. The table presents all the results of calculations done in the first iteration. You will see modified or new weight matrices and threshold values. You will use these and the original input vector and the desired output vector to carry out the next iteration.

Backpropagation Calculations

Item

I-1

I-2

I-3

H-1

H-2

O-1

O-2

O-3

Input

0.52

0.75

0.97

 

 

 

 

 

Desired Output

 

 

 

 

 

0.24

0.17

0.65

M-1 Row 1

 

 

 

0.6

- 0.4

 

 

 

M-1 Row 2

 

 

 

0.2

0.8

 

 

 

M-1 Row 3

 

 

 

- 0.5

0.3

 

 

 

M-2 Row 1

 

 

 

 

 

- 0.90

0.43

0.25

M-2 Row 2

 

 

 

 

 

0.11

- 0.67

- 0.75

Threshold

 

 

 

0.2

0.3

0.15

0.25

0.05

Activation - H

 

 

 

- 0.023

0.683

 

 

 

Activation + Threshold -H

 

 

 

0.177

0.983

 

 

 

Output -H

 

 

 

0.544

0.728

 

 

 

Complement

 

 

 

0.456

0.272

 

 

 

Activation -O

 

 

 

 

 

- 0.410

- 0.254

- 0.410

Activation + Threshold -O

 

 

 

 

 

- 0.260

- 0.004

- 0.360

Output -O

 

 

 

 

 

0.435

0.499

0.411

Complement

 

 

 

 

 

0.565

0.501

0.589

Diff. from Target

 

 

 

 

 

- 0.195

- 0.329

0.239

Computed Error -O

 

 

 

 

 

- 0.048

- 0.082

0.058

Computed Error -H

 

 

 

0.0056

0.0012

 

 

 

Adjustment to Threshold

 

 

 

0.0008

0.0002

-0.0096

-0.0164

0.0116

Adjustment to M-2 Column 1

 

 

 

-0.0005

-0.0070

 

 

 

Adjustment to M-2 Column 2

 

 

 

0.0007

0.0008

 

 

 

Adjustment to M-2 Column 3

 

 

 

0.0008

0.0011

 

 

 

New Matrix M-2 Row 1

 

 

 

 

 

- 0.91

0.412

0.262

New Matrix M-2 Row 2

 

 

 

 

 

0.096

- 0.694

- 0.734

New Threshold Values -O

 

 

 

 

 

0.1404

0.2336

0.0616

Adjustment to M-1 Row 1

 

 

 

0.0004

-0.0001

 

 

 

Adjustment to M-1 Row 2

 

 

 

0.0006

0.0001

 

 

 

Adjustment to M-1 Row 3

 

 

 

0.0008

0.0002

 

 

 

New Matrix M-1 Row 1

 

 

 

0.6004

- 0.4

 

 

 

New Matrix M-1 Row 2

 

 

 

0.2006

0.8001

 

 

 

New Matrix M-1 Row 3

 

 

 

-0.4992

0.3002

 

 

 

New Threshold Values -H

 

 

 

0.2008

0.3002

 

 

 

The top row in the table gives headings for the columns. They are, Item, I-1, I-2, I-3 (I-k being for input layer neuron k); H-1, H-2 (for hidden layer neurons); and O-1, O-2, O-3 (for output layer neurons).

In the first column of the table, M-1 and M-2 refer to weight matrices as above. Where an entry is appended with -H, like in Output -H, the information refers to the hidden layer. Similarly, -O refers to the output layer, as in Activation + threshold -O.

M-1 Matrix of weights from input layer to hidden layer:

             0.6004   - 0.4
             0.2006     0.8001
           - 0.4992     0.3002

M-2 Matrix of weights from hidden layer to output layer:

           -0.910       0.412      0.262
            0.096      -0.694     -0.734

The threshold values (or bias) for neurons in the hidden layer are 0.2008 and 0.3002, while those for the output neurons are 0.1404, 0.2336, and 0.0616, respectively.

You can keep the learning parameters as 0.15 for connections between input and hidden layer neurons, and 0.2 for connections between the hidden layer neurons and output neurons, or you can slightly modify them. Whether or not to change these two parameters is a decision that can be made perhaps at a later iteration, having obtained a sense of how the process is converging.

If you are satisfied with the rate at which the computed output pattern is getting close to the target output pattern, you would not change these learning rates. If you feel the convergence is much slower than you would like, then the learning rate parameters can be adjusted slightly upwards. It is a subjective decision both in terms of when (if at all) and to what new levels these parameters need to be revised.

Notation and Equations

You have just seen an example of the process of training in the feedforward backpropagation network, described in relation to one hidden layer neuron and one input neuron. There were a few vectors that were shown and used, but perhaps not made easily identifiable. We therefore introduce some notation and describe the equations that were implicitly used in the example.

Notation

Let us talk about two matrices whose elements are the weights on connections. One matrix refers to the interface between the input and hidden layers, and the second refers to that between the hidden layer and the output layer. Since connections exist from each neuron in one layer to every neuron in the next layer, there is a vector of weights on the connections going out from any one neuron. Putting this vector into a row of the matrix, we get as many rows as there are neurons from which connections are established.

Let M1 and M2 be these matrices of weights. Then what does M1[i][j] represent? It is the weight on the connection from the ith input neuron to the jth neuron in the hidden layer. Similarly, M2[i][j] denotes the weight on the connection from the ith neuron in the hidden layer and the jth output neuron.

Next, we will use x, y, z for the outputs of neurons in the input layer, hidden layer, and output layer, respectively, with a subscript attached to denote which neuron in a given layer we are referring to. Let P denote the desired output pattern, with pi as the components. Let m be the number of input neurons, so that according to our notation, (x1, x2, …, xm) will denote the input pattern. If P has, say, r components, the output layer needs r neurons. Let the number of hidden layer neurons be n. Let βh be the learning rate parameter for the hidden layer, and βo′, that for the output layer. Let θ with the appropriate subscript represent the threshold value or bias for a hidden layer neuron, and τ with an appropriate subscript refer to the threshold value of an output neuron.

Let the errors in output at the output layer be denoted by ejs and those at the hidden layer by ti’s. If we use a Δ prefix of any parameter, then we are looking at the change in or adjustment to that parameter. Also, the thresholding function we would use is the sigmoid function, f(x) = 1 / (1 + exp(–x)).

Equations

Output of jth hidden layer neuron: yj = f( (Σi xiM1[ i ][ j ] ) + θj )                                                                                      (7.1)

Output of jth output layer neuron: zj = f( (Σi yiM2[ i ][ j ] ) + τj )                                                                                        (7.2)

Ith component of vector of output differences: desired value - computed value = Pi – zi

Ith component of output error at the output layer: ei = ( Pi - zi)                                                                                                         (7.3)

Ith component of output error at the hidden layer: ti = yi (1 - yi ) (Σj M2[ i ][ j ] ej)                                                            (7.4)

Adjustment for weight between ith neuron in hidden layer and jth output neuron: ΔM2[ i ][ j ] = βo yiej                     (7.5)

Adjustment for weight between ith input neuron and jth neuron in hidden layer: M1[ i ][ j ] = βhxitj                              (7.6)

Adjustment to the threshold value or bias for the jth output neuron: Δ θj = βo ej

Adjustment to the threshold value or bias for the jth hidden layer neuron: δθj = βh ej

For use of momentum parameter α (more on this parameter in Chapter 13), instead of equations 7.5 and 7.6, use: ΔM2[ i ][ j ] ( t ) = βo yiej + αΔM2[ i ][ j ] ( t - 1 )                   (7.7)

ΔM1[ i ][ j ] ( t ) = βh xitj + αΔM1[ i ][ j ] (t - 1)                                                                                                 (7.8)

C++ Implementation of a Backpropagation Simulator

The backpropagation simulator of this chapter has the following design objectives:

1.  Allow the user to specify the number and size of all layers.

2.  Allow the use of one or more hidden layers.

3.  Be able to save and restore the state of the network.

4.  Run from an arbitrarily large training data set or test data set.

5.  Query the user for key network and simulation parameters.

6.  Display key information at the end of the simulation.

7.  Demonstrate the use of some C++ features.

A Brief Tour of How to Use the Simulator

In order to understand the C++ code, let us have an overview of the functioning of the program.

There are two modes of operation in the simulator. The user is queried first for which mode of operation is desired. The modes are Training mode and Nontraining mode (Test mode).

Training Mode

Here, the user provides a training file in the current directory called training.dat. This file contains exemplar pairs, or patterns. Each pattern has a set of inputs followed by a set of outputs. Each value is separated by one or more spaces. As a convention, you can use a few extra spaces to separate the inputs from the outputs. Here is an example of a training.dat file that contains two patterns:

     0.4 0.5 0.89           -0.4 -0.8
     0.23 0.8 -0.3          0.6 0.34

In this example, the first pattern has inputs 0.4, 0.5, and 0.89, with an expected output of –0.4 and –0.8. The second pattern has inputs of 0.23, 0.8, and –0.3 and outputs of 0.6 and 0.34. Since there are three inputs and two outputs, the input layer size for the network must be three neurons and the output layer size must be two neurons. Another file that is used in training is the weights file. Once the simulator reaches the error tolerance that was specified by the user, or the maximum number of iterations, the simulator saves the state of the network, by saving all of its weights in a file called weights.dat. This file can then be used subsequently in another run of the simulator in Nontraining mode. To provide some idea of how the network has done, information about the total and average error is presented at the end of the simulation. In addition, the output generated by the network for the last pattern vector is provided in an output file called output.dat.

Nontraining Mode (Test Mode)

In this mode, the user provides test data to the simulator in a file called test.dat. This file contains only input patterns. When this file is applied to an already trained network, an output.dat file is generated, which contains the outputs from the network for all of the input patterns. The network goes through one cycle of operation in this mode, covering all the patterns in the test data file. To start up the network, the weights file, weights.dat is read to initialize the state of the network. The user must provide the same network size parameters used to train the network.

Operation

The first thing to do with your simulator is to train a network with an architecture you choose. You can select the number of layers and the number of hidden layers for your network. Keep in mind that the input and output layer sizes are dictated by the input patterns you are presenting to the network and the outputs you seek from the network. Once you decide on an architecture, perhaps a simple three-layer network with one hidden layer, you prepare training data for it and save the data in the training.dat file. After this you are ready to train. You provide the simulator with the following information:

  The mode (select 1 for training)

  The values for the error tolerance and the learning rate parameter, lambda or beta

  The maximum number of cycles, or passes through the training data you’d like to try

  The number of layers (between three and five, three implies one hidden layer, while five implies three hidden layers)

  The size for each layer, from the input to the output

The simulator then begins training and reports the current cycle number and the average error for each cycle. You should watch the error to see that it is on the whole decreasing with time. If it is not, you should restart the simulation, because this will start with a brand new set of random weights and give you another, possibly better, solution. Note that there will be legitimate periods where the error may increase for some time. Once the simulation is done you will see information about the number of cycles and patterns used, and the total and average error that resulted. The weights are saved in the weights.dat file. You can rename this file to use this particular state of the network later. You can infer the size and number of layers from the information in this file, as will be shown in the next section for the weights.dat file format. You can have a peek at the output.dat file to see the kind of training result you have achieved. To get a full-blown accounting of each pattern and the match to that pattern, copy the training file to the test file and delete the output information from it. You can then run Test mode to get a full list of all the input stimuli and responses in the output.dat file.

Summary of Files Used in the Backpropagation Simulator

Here is a list of the files for your reference, as well as what they are used for.

  weights.dat You can look at this file to see the weights for the network. It shows the layer number followed by the weights that feed into the layer. The first layer, or input layer, layer zero, does not have any weights associated with it. An example of the weights.dat file is shown as follows for a network with three layers of sizes 3, 5, and 2. Note that the row width for layer n matches the column length for layer n + 1:

1 -0.199660 -0.859660 -0.339660 -0.25966 0.520340
1  0.292860 -0.487140 0.212860 -0.967140 -0.427140
1  0.542106 -0.177894 0.322106 -0.977894 0.562106
2 -0.175350 -0.835350
2 -0.330167 -0.250167
2  0.503317 0.283317
2 -0.477158 0.222842
2 -0.928322 -0.388322


In this weights file the row width for layer 1 is 5, corresponding to the output of that (middle) layer. The input for the layer is the column length, which is 3, just as specified. For layer 2, the output size is the row width, which is 2, and the input size is the column length, 5, which is the same as the output for the middle layer. You can read the weights file to find out how things look.

  training.dat This file contains the input patterns for training. You can have as large a file as you’d like without degrading the performance of the simulator. The simulator caches data in memory for processing. This is to improve the speed of the simulation since disk accesses are expensive in time. A data buffer, which has a maximum size specified in a #define statement in the program, is filled with data from the training.dat file whenever data is needed. The format for the training.dat file has been shown in the Training mode section.

  test.dat The test.dat file is just like the training.dat file but without expected outputs. You use this file with a trained neural network in Test mode to see what responses you get for untrained data.

  output.dat The output.dat file contains the results of the simulation. In Test mode, the input and output vectors are shown for all pattern vectors. In the Simulator mode, the expected output is also shown, but only the last vector in the training set is presented, since the training set is usually quite large.
Shown here is an example of an output file in Training mode:

for input vector:
0.400000  -0.400000
output vector is:
0.880095
expected output vector is:
0.900000

C++ Classes and Class Hierarchy

So far, you have learned how we address most of the objectives outlined for this program. The only objective left involves the demonstration of some C++ features. In this program we use a class hierarchy with the inheritance feature. Also, we use polymorphism with dynamic binding and function overloading with static binding. First let us look at the class hierarchy used for this program. An abstract class is a class that is never meant to be instantiated as an object, but serves as a base class from which others can inherit functionality and interface definitions. The layer class is such a class. You will see shortly that one of its functions is set = zero, which indicates that this class is an abstract base class. From the layer class are two branches. One is the input_layer class, and the other is the output_layer class. The middle layer class is very much like the output layer in function and so inherits from the output_layer class.

Class hierarchy used in the backpropagation simulator.

Function overloading can be seen in the definition of the calc_error() function. It is used in the input_layer with no parameters, while it is used in the output_layer (which the input_layer inherits from) with one parameter. Using the same function name is not a problem, and this is referred to as overloading. Besides function overloading, you may also have operator overloading, which is using an operator that performs some familiar function like + for addition, for another function, say, vector addition.

When you have overloading with the same parameters and the keyword virtual, then you have the potential for dynamic binding, which means that you determine which overloaded function to execute at run time and not at compile time. Compile time binding is referred to as static binding. If you put a bunch of C++ objects in an array of pointers to the base class, and then go through a loop that indexes each pointer and executes an overloaded virtual function that pointer is pointing to, then you will be using dynamic binding. This is exactly the case in the function calc_out(), which is declared with the virtual keyword in the layer base class. Each descendant of layer can provide a version of calc_out(), which differs in functionality from the base class, and the correct function will be selected at run time based on the object’s identity. In this case calc_out(), which is a function to calculate the outputs for each layer, is different for the input layer than for the other two types of layers.

Let’s look at some details in the header file in listing:

Header file for the backpropagation simulator

// layer.h            V.Rao, H. Rao
// header file for the layer class hierarchy and
// the network class
 
#define MAX_LAYERS    5
#define MAX_VECTORS   100
 
class network;
class layer {
protected:
       int num_inputs;
       int num_outputs;
       float *outputs;// pointer to array of outputs
       float *inputs; // pointer to array of inputs, which are outputs of some other layer
       friend network;
public:
       virtual void calc_out()=0;
 
};
 
class input_layer: public layer{
private:
public:
       input_layer(int, int);
       ~input_layer();
       virtual void calc_out();
 
};
 
class middle_layer;
 
class output_layer:   public layer
{
protected:
 
       float * weights;
       float * output_errors;    // array of errors at output
       float * back_errors;      // array of errors back-propagated
       float * expected_values;  // to inputs
   friend network;
 
public:
 
       output_layer(int, int);
       ~output_layer();
       virtual void calc_out();
       void calc_error(float &);
       void randomize_weights();
       void update_weights(const float);
       void list_weights();
       void write_weights(int, FILE *);
       void read_weights(int, FILE *);
       void list_errors();
       void list_outputs();
};
 
class middle_layer:   public output_layer
{
 
private:
 
public:
    middle_layer(int, int);
    ~middle_layer();
       void calc_error();
};
 
class network
 
{
 
private:
 
    layer *layer_ptr[MAX_LAYERS];
    int number_of_layers;
    int layer_size[MAX_LAYERS];
    float *buffer;
    fpos_t position;
    unsigned training;
 
public:
    network();
    ~network();
       void set_training(const unsigned &);
       unsigned get_training_value();
       void get_layer_info();
       void set_up_network();
       void randomize_weights();
       void update_weights(const float);
       void write_weights(FILE *);
       void read_weights(FILE *);
       void list_weights();
       void write_outputs(FILE *);
       void list_outputs();
       void list_errors();
       void forward_prop();
       void backward_prop(float &);
       int fill_IObuffer(FILE *);
       void set_up_pattern(int);
};

Details of the Backpropagation Header File

At the top of the file, there are two #define statements, which are used to set the maximum number of layers that can be used, currently five, and the maximum number of training or test vectors that can be read into an I/O buffer. This is currently 100. You can increase the size of the buffer for better speed at the cost of increased memory usage.

The following are definitions in the layer base class. Note that the number of inputs and outputs are protected data members, which means that they can be accessed freely by descendants of the class.

int num_inputs;
int num_outputs;
float *outputs;      // pointer to array of outputs
float *inputs;       // pointer to array of inputs, which
                     // are outputs of some other layer
friend network;

There are also two pointers to arrays of floats in this class. They are the pointers to the outputs in a given layer and the inputs to a given layer. To get a better idea of what a layer encompasses, Figure 7.3 shows you a small feedforward backpropagation network, with a dotted line that shows you the three layers for that network. A layer contains neurons and weights. The layer is responsible for calculating its output (calc_out()), stored in the float * outputs array, and errors (calc_error()) for each of its respective neurons. The errors are stored in another array called float * output_errors defined in the output class. Note that the input class does not have any weights associated with it and therefore is a special case. It does not need to provide any data members or function members related to errors or backpropagation. The only purpose of the input layer is to store data to be forward propagated to the next layer.

Organization of layers for backpropagation program.

With the output layer, there are a few more arrays present. First, for storing backpropagated errors, there is an array called float * back_errors. There is a weights array called float * weights, and finally, for storing the expected values that initiate the error calculation process, there is an array called float * expected_values. Note that the middle layer needs almost all of these arrays and inherits them by being a derived class of the output_layer class.

There is one other class besides the layer class and its descendants defined in this header file, and that is the network class, which is used to set up communication channels between layers and to feed and remove data from the network. The network class performs the interconnection of layers by setting the pointer of an input array of a given layer to the output array of a previous layer.


This is a fairly extensible scheme that can be used to create variations on the feedforward backpropagation network with feedback connections, for instance.


Another connection that the network class is responsible for is setting the pointer of an output_error array to the back_error array of the next layer (remember, errors flow in reverse, and the back_error array is the output error of the layer reflected at its inputs).

The network class stores an array of pointers to layers and an array of layer sizes for all the layers defined. These layer objects and arrays are dynamically allocated on the heap with the New and Delete functions in C++. There is some minimal error checking for file I/O and memory allocation, which can be enhanced, if desired.

As you can see, the feedforward backpropagation network can quickly become a memory and CPU hog, with large networks and large training sets. The size and topology of the network, or architecture, will largely dictate both these characteristics.

Details of the Backpropagation Implementation File

The implementation of the classes and methods is the next topic. Let’s look at the layer.cpp file in listing:

layer.cpp implementation file for the backpropagation simulator

// layer.cpp           V.Rao, H.Rao
// compile for floating point hardware if available
#include <stdio.h>
#include <iostream.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include "layer.h"
 
inline float squash(float input)
// squashing function
// use sigmoid -- can customize to something
// else if desired; can add a bias term too
//
{
if (input < -50)
       return 0.0;
else   if (input > 50)
              return 1.0;
       else return (float)(1/(1+exp(-(double)input)));
}
 
inline float randomweight(unsigned init)
{
int num;
// random number generator
// will return a floating point
// value between -1 and 1
 
if (init==1)   // seed the generator
       srand ((unsigned)time(NULL));
 
num=rand() % 100;
 
return 2*(float(num/100.00))-1;
}
 
// the next function is needed for Turbo C++
// and Borland C++ to link in the appropriate
// functions for fscanf floating point formats:
static void force_fpf()
{
       float x, *y;
       y=&x;
       x=*y;
}
 
// --------------------
//                            input layer
//---------------------
input_layer::input_layer(int i, int o)
{
 
num_inputs=i;
num_outputs=o;
 
outputs = new float[num_outputs];
if (outputs==0)
       {
       cout << "not enough memory\n";
       cout << "choose a smaller architecture\n";
       exit(1);
       }
}
 
input_layer::~input_layer()
{
delete [num_outputs] outputs;
}
 
void input_layer::calc_out()
{
//nothing to do, yet
}
 
// --------------------
//                            output layer
//---------------------
 
output_layer::output_layer(int i, int o)
{
 
num_inputs         =i;
num_outputs        =o;
weights            = new float[num_inputs*num_outputs];
output_errors      = new float[num_outputs];
back_errors        = new float[num_inputs];
outputs            = new float[num_outputs];
expected_values    = new float[num_outputs];
if ((weights==0)||(output_errors==0)||(back_errors==0)
       ||(outputs==0)||(expected_values==0))
       {
       cout << "not enough memory\n";
       cout << "choose a smaller architecture\n";
       exit(1);
       }
}
 
output_layer::~output_layer()
{
// some compilers may require the array
// size in the delete statement; those
// conforming to Ansi C++ will not
delete [num_outputs*num_inputs] weights;
delete [num_outputs] output_errors;
delete [num_inputs] back_errors;
delete [num_outputs] outputs;
 
}
 
void output_layer::calc_out()
{
 
int i,j,k;
float accumulator=0.0;
 
for (j=0; j<num_outputs; j++)
       {
 
       for (i=0; i<num_inputs; i++)
 
              {
              k=i*num_outputs;
              if (weights[k+j]*weights[k+j] > 1000000.0)
                     {
                     cout << "weights are blowing up\n";
                     cout << "try a smaller learning constant\n";
                     cout << "e.g. beta=0.02    aborting...\n";
                     exit(1);
                     }
              outputs[j]=weights[k+j]*(*(inputs+i));
              accumulator+=outputs[j];
              }
 
       // use the sigmoid squash function
       outputs[j]=squash(accumulator);
       accumulator=0;
       }
}
 
void output_layer::calc_error(float & error)
{
int i, j, k;
float accumulator=0;
float total_error=0;
 
for (j=0; j<num_outputs; j++)
    {
       output_errors[j] = expected_values[j]-outputs[j];
       total_error+=output_errors[j];
       }
 
error=total_error;
 
for (i=0; i<num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j<num_outputs; j++)
              {
              back_errors[i]=
                     weights[k+j]*output_errors[j];
              accumulator+=back_errors[i];
              }
       back_errors[i]=accumulator;
       accumulator=0;
       // now multiply by derivative of
       // sigmoid squashing function, which is
       // just the input*(1-input)
       back_errors[i]*=(*(inputs+i))*(1-(*(inputs+i)));
       }
 
}
 
void output_layer::randomize_weights()
{
int i, j, k;
const unsigned first_time=1;
 
const unsigned not_first_time=0;
float discard;
 
discard=randomweight(first_time);
 
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              weights[k+j]=randomweight(not_first_time);
       }
}
 
void output_layer::update_weights(const float beta)
{
int i, j, k;
 
// learning law: weight_change =
//            beta*output_error*input
 
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              weights[k+j] +=
                     beta*output_errors[i]*(*(inputs+i));
    }
 
}
 
void output_layer::list_weights()
{
int i, j, k;
 
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              cout << "weight["<<i<<","<<
                     j<<"] is: "<<weights[k+j];
       }
}
 
void output_layer::list_errors()
{
int i, j;
 
for (i=0; i< num_inputs; i++)
       cout << "backerror["<<i<<
              "] is : "<<back_errors[i]<<"\n";
 
for (j=0; j< num_outputs; j++)
       cout << "outputerrors["<<j<<
                     "] is: "<<output_errors[j]<<"\n";
}
 
void output_layer::write_weights(int layer_no,
              FILE * weights_file_ptr)
{
int i, j, k;
 
// assume file is already open and ready for
// writing
 
// prepend the layer_no to all lines of data
// format:
//            layer_no   weight[0,0] weight[0,1] ...
//            layer_no   weight[1,0] weight[1,1] ...
//            ...
 
for (i=0; i< num_inputs; i++)
       {
       fprintf(weights_file_ptr,"%i ",layer_no);
       k=i*num_outputs;
    for (j=0; j< num_outputs; j++)
       {
       fprintf(weights_file_ptr,"%f",
                     weights[k+j]);
       }
    fprintf(weights_file_ptr,"\n");
    }
}
 
void output_layer::read_weights(int layer_no,
              FILE * weights_file_ptr)
{
int i, j, k;
 
// assume file is already open and ready for
// reading
 
// look for the prepended layer_no
// format:
//            layer_no       weight[0,0] weight[0,1] ...
//            layer_no       weight[1,0] weight[1,1] ...
//            ...
while (1)
 
       {
 
       fscanf(weights_file_ptr,"%i",&j);
       if ((j==layer_no)|| (feof(weights_file_ptr)))
              break;
       else
              {
              while (fgetc(weights_file_ptr) != `\n')
                     {;}// get rest of line
              }
       }
 
if (!(feof(weights_file_ptr)))
       {
       // continue getting first line
       i=0;
       for (j=0; j< num_outputs; j++)
                         {
 
                         fscanf(weights_file_ptr,"%f",
                              &weights[j]); // i*num_outputs = 0
                         }
       fscanf(weights_file_ptr,"\n");
 
       // now get the other lines
       for (i=1; i< num_inputs; i++)
              {
              fscanf(weights_file_ptr,"%i",&layer_no);
              k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              {
              fscanf(weights_file_ptr,"%f",
                     &weights[k+j]);
              }
 
              }
    fscanf(weights_file_ptr,"\n");
    }
 
else cout << "end of file reached\n";
 
}
void output_layer::list_outputs()
{
int j;
 
for (j=0; j< num_outputs; j++)
       {
       cout << "outputs["<<j
              <<"] is: "<<outputs[j]<<"\n";
       }
 
}
 
// ---------------------
//                            middle layer
//----------------------
middle_layer::middle_layer(int i, int o):
       output_layer(i,o)
{
 
}
 
middle_layer::~middle_layer()
{
delete [num_outputs*num_inputs] weights;
delete [num_outputs] output_errors;
delete [num_inputs] back_errors;
delete [num_outputs] outputs;
}
 
void middle_layer::calc_error()
{
int i, j, k;
float accumulator=0;
 
for (i=0; i<num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j<num_outputs; j++)
              {
              back_errors[i]=
                     weights[k+j]*(*(output_errors+j));
 
              accumulator+=back_errors[i];
              }
       back_errors[i]=accumulator;
       accumulator=0;
       // now multiply by derivative of
       // sigmoid squashing function, which is
       // just the input*(1-input)
       back_errors[i]*=(*(inputs+i))*(1-(*(inputs+i)));
       }
}
 
network::network()
{
position=0L;
}
 
network::~network()
{
int i,j,k;
i=layer_ptr[0]->num_outputs;// inputs
j=layer_ptr[number_of_layers-1]->num_outputs; //outputs
k=MAX_VECTORS;
 
delete [(i+j)*k]buffer;
}
 
void network::set_training(const unsigned & value)
{
training=value;
}
 
unsigned network::get_training_value()
{
return training;
}
 
void network::get_layer_info()
{
int i;
 
//---------------------
//
//     Get layer sizes for the network
//
// --------------------
 
cout << " Please enter in the number of layers for your network.\n";
cout << " You can have a minimum of 3 to a maximum of 5. \n";
cout << " 3 implies 1 hidden layer; 5 implies 3 hidden layers : \n\n";
 
cin >> number_of_layers;
 
cout << " Enter in the layer sizes separated by spaces.\n";
cout << " For a network with 3 neurons in the input layer,\n";
cout << " 2 neurons in a hidden layer, and 4 neurons in the\n";
cout << " output layer, you would enter: 3 2 4 .\n";
cout << " You can have up to 3 hidden layers,for five maximum entries
:\n\n";
 
for (i=0; i<number_of_layers; i++)
       {
       cin >> layer_size[i];
       }
 
// --------------------------
// size of layers:
//            input_layer       layer_size[0]
//            output_layer      layer_size[number_of_layers-1]
//            middle_layers     layer_size[1]
//                              optional: layer_size[number_of_layers-3]
//                              optional: layer_size[number_of_layers-2]
//---------------------------
 
}
void network::set_up_network()
{
int i,j,k;
//---------------------------
// Construct the layers
//
//---------------------------
 
layer_ptr[0] = new input_layer(0,layer_size[0]);
 
for (i=0;i<(number_of_layers-1);i++)
       {
       layer_ptr[i+1y] =
       new middle_layer(layer_size[i],layer_size[i+1]);
       }
 
layer_ptr[number_of_layers-1] = new
output_layer(layer_size[number_of_layers-2],layer_size[number_of_layers-
1]);
 
for (i=0;i<(number_of_layers-1);i++)
       {
       if (layer_ptr[i] == 0)
              {
              cout << "insufficient memory\n";
              cout << "use a smaller architecture\n";
              exit(1);
              }
       }
 
//--------------------------
// Connect the layers
//
//--------------------------
// set inputs to previous layer outputs for all layers,
//     except the input layer
 
for (i=1; i< number_of_layers; i++)
        layer_ptr[i]->inputs = layer_ptr[i-1y]->outputs;
 
// for back_propagation, set output_errors to next layer
//            back_errors for all layers except the output
//            layer and input layer
 
for (i=1; i< number_of_layers -1; i++)
       ((output_layer *)layer_ptr[i])->output_errors =
              ((output_layer *)layer_ptr[i+1])->back_errors;
 
// define the IObuffer that caches data from
// the datafile
i=layer_ptr[0]->num_outputs;// inputs
j=layer_ptr[number_of_layers-1]->num_outputs; //outputs
k=MAX_VECTORS;
 
buffer=new
       float[(i+j)*k];
if (buffer==0)
       cout << "insufficient memory for buffer\n";
}
 
void network::randomize_weights()
{
int i;
 
for (i=1; i<number_of_layers; i++)
       ((output_layer *)layer_ptr[i])
              ->randomize_weights();
}
 
void network::update_weights(const float beta)
{
int i;
 
for (i=1; i<number_of_layers; i++)
       ((output_layer *)layer_ptr[i])
              ->update_weights(beta);
}
 
void network::write_weights(FILE * weights_file_ptr)
{
int i;
 
for (i=1; i<number_of_layers; i++)
       ((output_layer *)layer_ptr[i])
              ->write_weights(i,weights_file_ptr);
}
 
void network::read_weights(FILE * weights_file_ptr)
{
int i;
 
for (i=1; i<number_of_layers; i++)
       ((output_layer *)layer_ptr[i])
              ->read_weights(i,weights_file_ptr);
}
 
void network::list_weights()
{
int i;
 
for (i=1; i<number_of_layers; i++)
       {
       cout << "layer number : " <<i<< "\n";
       ((output_layer *)layer_ptr[i])
              ->list_weights();
       }
}
 
void network::list_outputs()
{
int i;
 
for (i=1; i<number_of_layers; i++)
       {
       cout << "layer number : " <<i<< "\n";
       ((output_layer *)layer_ptr[i])
              ->list_outputs();
       }
}
 
void network::write_outputs(FILE *outfile)
{
int i, ins, outs;
ins=layer_ptr[0]->num_outputs;
outs=layer_ptr[number_of_layers-1]->num_outputs;
float temp;
 
fprintf(outfile,"for input vector:\n");
 
for (i=0; i<ins; i++)
       {
       temp=layer_ptr[0]->outputs[i];
       fprintf(outfile,"%f  ",temp);
       }
 
fprintf(outfile,"\noutput vector is:\n");
 
for (i=0; i<outs; i++)
       {
       temp=layer_ptr[number_of_layers-1]->
       outputs[i];
       fprintf(outfile,"%f  ",temp);
 
       }
 
if (training==1)
{
fprintf(outfile,"\nexpected output vector is:\n");
 
for (i=0; i<outs; i++)
       {
       temp=((output_layer *)(layer_ptr[number_of_layers-1]))->
       expected_values[i];
       fprintf(outfile,"%f  ",temp);
 
       }
}
 
fprintf(outfile,"\n----------\n");
 
}
 
void network::list_errors()
{
int i;
 
for (i=1; i<number_of_layers; i++)
       {
       cout << "layer number : " <<i<< "\n";
       ((output_layer *)layer_ptr[i])
              ->list_errors();
       }
}
 
int network::fill_IObuffer(FILE * inputfile)
{
// this routine fills memory with
// an array of input, output vectors
// up to a maximum capacity of
// MAX_INPUT_VECTORS_IN_ARRAY
// the return value is the number of read
// vectors
 
int i, k, count, veclength;
 
int ins, outs;
 
ins=layer_ptr[0]->num_outputs;
 
outs=layer_ptr[number_of_layers-1]->num_outputs;
 
if (training==1)
 
       veclength=ins+outs;
else
       veclength=ins;
 
count=0;
while  ((count<MAX_VECTORS)&&
              (!feof(inputfile)))
       {
       k=count*(veclength);
       for (i=0; i<veclength; i++)
              {
              fscanf(inputfile,"%f",&buffer[k+i]);
              }
       fscanf(inputfile,"\n");
       count++;
       }
 
if (!(ferror(inputfile)))
       return count;
else return -1; // error condition
 
}
 
void network::set_up_pattern(int buffer_index)
{
// read one vector into the network
int i, k;
int ins, outs;
 
ins=layer_ptr[0]->num_outputs;
outs=layer_ptr[number_of_layers-1]->num_outputs;
if (training==1)
       k=buffer_index*(ins+outs);
else
       k=buffer_index*ins;
 
for (i=0; i<ins; i++)
       layer_ptr[0]->outputs[i]=buffer[k+i];
 
if (training==1)
{
       for (i=0; i<outs; i++)
 
              ((output_layer *)layer_ptr[number_of_layers-1])->
                     expected_values[i]=buffer[k+i+ins];
}
 
}
 
void network::forward_prop()
{
int i;
for (i=0; i<number_of_layers; i++)
       {
       layer_ptr[i]->calc_out(); //polymorphic
                                 // function
       }
}
 
void network::backward_prop(float & toterror)
{
int i;
 
// error for the output layer
((output_layer*)layer_ptr[number_of_layers-1])->
                      calc_error(toterror);
 
// error for the middle layer(s)
for (i=number_of_layers-2; i>0; i--)
       {
       ((middle_layer*)layer_ptr[i])->
                     calc_error();
       }
 
}

A Look at the Functions in the layer.cpp File

The following is a listing of the functions in the layer.cpp file along with a brief statement of each one's purpose.

  void set_training(const unsigned &) Sets the value of the private data member, training; use 1 for training mode, and 0 for test mode.

  unsigned get_training_value() Gets the value of the training constant that gives the mode in use.

  void get_layer_info() Gets information about the number of layers and layer sizes from the user.

  void set_up_network() This routine sets up the connections between layers by assigning pointers appropriately.

  void randomize_weights() At the beginning of the training process, this routine is used to randomize all of the weights in the network.

  void update_weights(const float) As part of training, weights are updated according to the learning law used in backpropagation.

  void write_weights(FILE *) This routine is used to write weights to a file.

  void read_weights(FILE *) This routine is used to read weights into the network from a file.

  void list_weights() This routine can be used to list weights while a simulation is in progress.

  void write_outputs(FILE *) This routine writes the outputs of the network to a file.

  void list_outputs() This routine can be used to list the outputs of the network while a simulation is in progress.

  void list_errors() Lists errors for all layers while a simulation is in progress.

  void forward_prop() Performs the forward propagation.

  void backward_prop(float &) Performs the backward error propagation.

  int fill_IObuffer(FILE *) This routine fills the internal IO buffer with data from the training or test data sets.

  void set_up_pattern(int) This routine is used to set up one pattern from the IO buffer for training.

  inline float squash(float input) This function performs the sigmoid function.

  inline float randomweight (unsigned unit) This routine returns a random weight between –1 and 1; use 1 to initialize the generator, and 0 for all subsequent calls.


Note that the functions squash(float) and randomweight(unsigned) are declared inline. This means that the function's source code is inserted wherever it appears. This increases code size, but also increases speed because a function call, which is expensive, is avoided.


The final file to look at is the backprop.cpp file presented in listing:

The backprop.cpp file for the backpropagation simulator

// backprop.cpp         V. Rao, H. Rao
#include "layer.cpp"
 
#define TRAINING_FILE   "training.dat"
#define WEIGHTS_FILE    "weights.dat"
#define OUTPUT_FILE     "output.dat"
#define TEST_FILE       "test.dat"
 
void main()
{
 
float error_tolerance        =0.1;
float total_error            =0.0;
float avg_error_per_cycle    =0.0;
float error_last_cycle       =0.0;
float avgerr_per_pattern     =0.0; // for the latest cycle
float error_last_pattern     =0.0;
float learning_parameter     =0.02;
unsigned temp, startup;
long int vectors_in_buffer;
long int max_cycles;
long int patterns_per_cycle  =0;
 
long int total_cycles, total_patterns;
int i;
 
// create a network object
network backp;
 
FILE * training_file_ptr, * weights_file_ptr, * output_file_ptr;
FILE * test_file_ptr, * data_file_ptr;
 
// open output file for writing
if ((output_file_ptr=fopen(OUTPUT_FILE,"w"))==NULL)
               {
               cout << "problem opening output file\n";
               exit(1);
               }
 
// enter the training mode : 1=training on     0=training off
cout << "------------------------\n";
cout << " C++ Neural Networks and Fuzzy Logic \n";
cout << "      Backpropagation simulator \n";
cout << "             version 1 \n";
cout << "------------------------\n";
cout << "Please enter 1 for TRAINING on, or 0 for off: \n\n";
cout << "Use training to change weights according to your\n";
cout << "expected outputs. Your training.dat file should contain\n";
cout << "a set of inputs and expected outputs. The number of\n";
cout << "inputs determines the size of the first (input) layer\n";
cout << "while the number of outputs determines the size of the\n";
       cout << "last (output) layer :\n\n";
 
cin >> temp;
backp.set_training(temp);
 
if (backp.get_training_value() == 1)
       {
       cout << "--> Training mode is *ON*. weights will be saved\n";
       cout << "in the file weights.dat at the end of the\n";
       cout << "current set of input (training) data\n";
       }
else
       {
       cout << "--> Training mode is *OFF*. weights will be loaded\n";
       cout << "from the file weights.dat and the current\n";
       cout << "(test) data set will be used. For the test\n";
       cout << "data set, the test.dat file should contain\n";
       cout << "only inputs, and no expected outputs.\n";
}
 
if (backp.get_training_value()==1)
       {
       // --------------------
       //     Read in values for the error_tolerance,
       //     and the learning_parameter
       // --------------------
       cout << " Please enter in the error_tolerance\n";
       cout << " --- between 0.001 to 100.0, try 0.1 to start \n";
       cout << "\n";
       cout << "and the learning_parameter, beta\n";
       cout << " --- between 0.01 to 1.0, try 0.5 to start -- \n\n";
       cout << " separate entries by a space\n";
       cout << " example: 0.1 0.5 sets defaults mentioned :\n\n";
 
       cin >> error_tolerance >> learning_parameter;
       //---------------------
       // open training file for reading
       //--------------------
       if ((training_file_ptr=fopen(TRAINING_FILE,"r"))==NULL)
              {
              cout << "problem opening training file\n";
              exit(1);
              }
       data_file_ptr=training_file_ptr; // training on
 
       // Read in the maximum number of cycles
       // each pass through the input data file is a cycle
       cout << "Please enter the maximum cycles for the simula-\
       tion\n";
       cout << "A cycle is one pass through the data set.\n";
       cout << "Try a value of 10 to start with\n";
       cin >> max_cycles;
 
       }
else
       {
       if ((test_file_ptr=fopen(TEST_FILE,"r"))==NULL)
              {
              cout << "problem opening test file\n";
              exit(1);
              }
       data_file_ptr=test_file_ptr; // training off
       }
//
// training: continue looping until the total error is less than
//            the tolerance specified, or the maximum number of
//            cycles is exceeded; use both the forward signal propagation
//            and the backward error propagation phases. If the error
//            tolerance criteria is satisfied, save the weights in a file.
// no training: just proceed through the input data set once in the
//            forward signal propagation phase only. Read the starting
//            weights from a file.
// in both cases report the outputs on the screen
// initialize counters
total_cycles=0; // a cycle is once through all the input data
total_patterns=0; // a pattern is one entry in the input data
 
 
// get layer information
backp.get_layer_info();
// set up the network connections
backp.set_up_network();
// initialize the weights
if (backp.get_training_value()==1)
       {
       // randomize weights for all layers; there is no
       // weight matrix associated with the input layer
       // weight file will be written after processing
       // so open for writing
       if ((weights_file_ptr=fopen(WEIGHTS_FILE,"w"))==NULL)
              {
              cout << "problem opening weights file\n";
              exit(1);
              }
       backp.randomize_weights();
       }
else
       {
       // read in the weight matrix defined by a
       // prior run of the backpropagation simulator
       // with training on
       if ((weights_file_ptr=fopen(WEIGHTS_FILE,"r"))==NULL)
              {
              cout << "problem opening weights file\n";
              exit(1);
              }
       backp.read_weights(weights_file_ptr);
       }
 
 
// main loop
// if training is on, keep going through the input data
//             until the error is acceptable or the maximum number of
//     cycles
//             is exceeded.
// if training is off, go through the input data once. report // outputs
// with inputs to file output.dat
 
startup=1;
vectors_in_buffer = MAX_VECTORS; // startup condition
total_error = 0;
 
while (              ((backp.get_training_value()==1)
                     && (avgerr_per_pattern
                                   > error_tolerance)
                     && (total_cycles < max_cycles)
                     && (vectors_in_buffer !=0))
                     || ((backp.get_training_value()==0)
                     && (total_cycles < 1))
                     || ((backp.get_training_value()==1)
                     && (startup==1))
                     )
{
startup=0;
error_last_cycle=0; // reset for each cycle
patterns_per_cycle=0;
// process all the vectors in the datafile
// going through one buffer at a time
// pattern by pattern
 
 
while ((vectors_in_buffer==MAX_VECTORS))
       {
 
       vectors_in_buffer=
              backp.fill_IObuffer(data_file_ptr); // fill buffer
              if (vectors_in_buffer < 0)
                     {
                     cout << "error in reading in vectors, aborting\n";
                     cout << "check that there are no extra linefeeds\n";
                     cout << "in your data file, and that the number\n";
                     cout << "of layers and size of layers match the\n";
                     cout << "the parameters provided.\n";
                     exit(1);
                     }
              // process vectors
              for (i=0; i<vectors_in_buffer; i++)
                     {
                     // get next pattern
                     backp.set_up_pattern(i);
 
                     total_patterns++;
                     patterns_per_cycle++;
                     // forward propagate
 
                     backp.forward_prop();
 
                     if (backp.get_training_value()==0)
                             backp.write_outputs(output_file_ptr);
 
                     // back_propagate, if appropriate
                     if (backp.get_training_value()==1)
                             {
 
                             backp.backward_prop(error_last_pattern);
                             error_last_cycle += error_last_pattern
z                            *error_last_pattern;
                             backp.update_weights(learning_parameter);
                             // backp.list_weights();
                             // can
                             // see change in weights by
                             // using list_weights before and
                             // after back_propagation
                             }
 
                     }
       error_last_pattern = 0;
       }
 
avgerr_per_pattern=((float)sqrt((double)error_last_cycle
/patterns_per_cycle));
total_error += error_last_cycle;
total_cycles++;
// most character displays are 26 lines
// user will see a corner display of the cycle count as it changes
 
cout << "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
cout << total_cycles << "\t" << avgerr_per_pattern << "\n";
 
fseek(data_file_ptr, 0L, SEEK_SET); // reset the file pointer
                              // to the beginning of
                              // the file
vectors_in_buffer = MAX_VECTORS; // reset
 
} // end main loop
 
cout << "\n\n\n\n\n\n\n\n\n\n\n";
cout << "------------------------\n";
cout << "    done:   results in file output.dat\n";
cout << "            training: last vector only\n";
cout << "            not training: full cycle\n\n";
if (backp.get_training_value()==1)
       {
       backp.write_weights(weights_file_ptr);
       backp.write_outputs(output_file_ptr);
       avg_error_per_cycle = (float)sqrt((double)total_error/
total_cycles);
   error_last_cycle = (float)sqrt((double)error_last_cycle);
 
cout << "      weights saved in file weights.dat\n";
cout << "\n";
cout << "-->average error per cycle = " <<
avg_error_per_cycle << " <-\n";
cout << "-->error last cycle= " << error_last_cycle << " <-\n";
cout << "->error last cycle per pattern= " << avgerr_per_pattern << " <-
\n";
       }
 
cout << "------>total cycles = " << total_cycles << " <--\n";
cout << "------>total patterns = " << total_patterns << " <---\n";
cout << "-------------------------\n";
// close all files
fclose(data_file_ptr);
fclose(weights_file_ptr);
fclose(output_file_ptr);
 
}

The backprop.cpp file implements the simulator controls. First, data is accepted from the user for network parameters. Assuming Training mode is used, the training file is opened and data is read from the file to fill the IO buffer. Then the main loop is executed where the network processes pattern by pattern to complete a cycle, which is one pass through the entire training data set. (The IO buffer is refilled as required during this process.) After executing one cycle, the file pointer is reset to the beginning of the file and another cycle begins. The simulator continues with cycles until one of the two fundamental criteria is met:

1.  The maximum cycle count specified by the user is exceeded.

2.  The average error per pattern for the latest cycle is less than the error tolerance specified by the user.

When either of these occurs, the simulator stops and reports out the error achieved, and saves weights in the weights.dat file and one output vector in the output.dat file.

In Test mode, exactly one cycle is processed by the network and outputs are written to the output.dat file. At the beginning of the simulation in Test mode, the network is set up with weights from the weights.dat file. To simplify the program, the user is requested to enter the number of layers and size of layers, although you could have the program figure this out from the weights file.

Compiling and Running the Backpropagation Simulator

Compiling the backprop.cpp file will compile the simulator since layer.cpp is included in backprop.cpp. To run the simulator, once you have created an executable (using 80X87 floating point hardware if available), you type in backprop and see the following screen (user input in italic):

C++ Neural Networks and Fuzzy Logic
       Backpropagation simulator
               version 1
Please enter 1 for TRAINING on, or 0 for off:
 
Use training to change weights according to your
expected outputs. Your training.dat file should contain
a set of inputs and expected outputs. The number of
inputs determines the size of the first (input) layer
while the number of outputs determines the size of the
last (output) layer :
 
1
 
-> Training mode is *ON*. weights will be saved
in the file weights.dat at the end of the
current set of input (training) data
 Please enter in the error_tolerance
 -- between 0.001 to 100.0, try 0.1 to start --
 
and the learning_parameter, beta
 -- between 0.01 to 1.0, try 0.5 to start --
 
 separate entries by a space
 example: 0.1 0.5 sets defaults mentioned :
 
0.2 0.25
 
Please enter the maximum cycles for the simulation
A cycle is one pass through the data set.
Try a value of 10 to start with
Please enter in the number of layers for your network.
You can have a minimum of three to a maximum of five.
three implies one hidden layer; five implies three hidden layers:
 
3
 
Enter in the layer sizes separated by spaces.
For a network with three neurons in the input layer,
two neurons in a hidden layer, and four neurons in the
output layer, you would enter: 3 2 4.
You can have up to three hidden layers for five maximum entries :
 
2 2 1
 
1        0.353248
2        0.352684
3        0.352113
4        0.351536
5        0.350954
...
299      0.0582381
300      0.0577085
------------------------
         done:   results in file output.dat
                 training: last vector only
                 not training: full cycle
 
                 weights saved in file weights.dat
-->average error per cycle = 0.20268 <--
-->error last cycle = 0.0577085 <--
->error last cycle per pattern= 0.0577085 <--
------>total cycles = 300 <--
------>total patterns = 300 <--

The cycle number and the average error per pattern is displayed as the simulation progresses (not all values shown). You can monitor this to make sure the simulator is converging on a solution. If the error does not seem to decrease beyond a certain point, but instead drifts or blows up, then you should start the simulator again with a new starting point defined by the random weights initializer. Also, you could try decreasing the size of the learning rate parameter. Learning may be slower, but this may allow a better minimum to be found.


This example shows just one pattern in the training set with two inputs and one output. The results along with the (one) last pattern are shown as follows from the file output.dat:

for input vector:
0.400000  -0.400000
output vector is:
0.842291
expected output vector is:
0.900000

The match is pretty good, as can be expected, since the optimization is easy for the network; there is only one pattern to worry about. Let’s look at the final set of weights for this simulation in weights.dat. These weights were obtained by updating the weights for 300 cycles with the learning law:

     1 0.175039 0.435039
     1 -1.319244 -0.559244
     2 0.358281
     2 2.421172

We’ll leave the backpropagation simulator for now and return to it in a later chapter for further exploration. You can experiment a number of different ways with the simulator:

  Try a different number of layers and layer sizes for a given problem.

  Try different learning rate parameters and see its effect on convergence and training time.

  Try a very large learning rate parameter (should be between 0 and 1); try a number over 1 and note the result.

Summary

In this chapter, you learned about one of the most powerful neural network algorithms called backpropagation. Without having feedback connections, propagating only errors appropriately to the hidden layer and input layer connections, the algorithm uses the so-called generalized delta rule and trains the network with exemplar pairs of patterns. It is difficult to determine how many hidden-layer neurons are to be provided for. The number of hidden layers could be more than one. In general, the size of the hidden layer(s) is related to the features or distinguishing characteristics that should be discerned from the data. Our example in this chapter relates to a simple case where there is a single hidden layer. The outputs of the output neurons, and therefore of the network, are vectors with components between 0 and 1, since the thresholding function is the sigmoid function. These values can be scaled, if necessary, to get values in another interval.

Our example does not relate to any particular function to be computed by the network, but inputs and outputs were randomly chosen. What this can tell you is that, if you do not know the functional equation between two sets of vectors, the feedback backpropagation network can find the mapping for any vector in the domain, even if the functional equation is not found. For all we know, that function could be nonlinear as well.

There is one important fact you need to remember about the backpropagation algorithm. Its steepest descent procedure in training does not guarantee finding a global or overall minimum, it can find only a local minimum of the energy surface.