Backpropagation II

Enhancing the Simulator

In Chapter Backpropagation, you developed a backpropagation simulator. In this chapter, you will put it to use with examples and also add some new features to the simulator: a term called momentum, and the capability of adding noise to the inputs during simulation. There are many variations of the algorithm that try to alleviate two problems with backpropagation. First, like other neural networks, there is a strong possibility that the solution found with backpropagation is not a global error minimum, but a local one. You may need to shake the weights a little by some means to get out of the local minimum, and possibly arrive at a lower minimum. The second problem with backpropagation is speed. The algorithm is very slow at learning. There are many proposals for speeding up the search process. Neural networks are inherently parallel processing architectures and are suited for simulation on parallel processing hardware. While there are a few plug-in neural net or digital signal processing boards available in the market, the low-cost simulation platform of choice remains the personal computer. Speed enhancements to the training algorithm are therefore very buffernecessary.

Another Example of Using Backpropagation

Before modifying the simulator to add features, let’s look at the same problem we used the Kohonen map to analyze in Chapter 12. As you recall, we would like to be able to distinguish alphabetic characters by assigning them to different bins. For backpropagation, we would apply the inputs and train the network with anticipated responses. Here is the input file that we used for distinguishing five different characters, A, X, H, B, and I:

 
0 0 1 0 0  0 1 0 1 0  1 0 0 0 1  1 0 0 0 1  1 1 1 1 1  1 0 0 0 1
1 0 0 0 1  0 1 0 1 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 1 0 1 0
1 0 0 0 1  1 0 0 0 1  1 0 0 0 1  1 1 1 1 1  1 0 0 0 1  1 0 0 0 1
1 1 1 1 1  1 0 0 0 1  1 0 0 0 1  1 1 1 1 1  1 0 0 0 1  1 0 0 0 1
0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0
 
1 0 0 0 1
1 0 0 0 1
1 0 0 0 1
1 1 1 1 1
0 0 1 0 0

Each line has a 5x7 dot representation of each character. Now we need to name each of the output categories. We can assign a simple 3-bit representation as follows:

A

000

X

010

H

100

B

101

I

111

Let’s train the network to recognize these characters. The training.dat file looks like the following.

 
0 0 1 0 0  0 1 0 1 0  1 0 0 0 1  1 0 0 0 1  1 1 1 1 1  1 0 0 0 1
1 0 0 0 1  0 1 0 1 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 1 0 1 0
1 0 0 0 1  1 0 0 0 1  1 0 0 0 1  1 1 1 1 1  1 0 0 0 1  1 0 0 0 1
1 1 1 1 1  1 0 0 0 1  1 0 0 0 1  1 1 1 1 1  1 0 0 0 1  1 0 0 0 1
0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0
 
1 0 0 0 1  0 0 0
1 0 0 0 1  0 1 0
1 0 0 0 1  1 0 0
1 1 1 1 1  1 0 1
0 0 1 0 0  1 1 1

Now you can start the simulator. Using the parameters (beta = 0.1, tolerance = 0.001, and max_cycles = 1000) and with three layers of size 35 (input), 5 (middle), and 3 (output), you will get a typical result like the following.

-------------------------- -
         done:   results in file output.dat
                        training: last vector only
                        not training: full cycle
 
                        weights saved in file weights.dat
 
-->average error per cycle = 0.035713<--
-->error last cycle = 0.008223 <--
->error last cycle per pattern= 0.00164455 <--
------>total cycles = 1000 <--
------>total patterns = 5000 <--
---------------------------

The simulator stopped at the 1000 maximum cycles specified in this case. Your results will be different since the weights start at a random point. Note that the tolerance specified was nearly met. Let us see how close the output came to what we wanted. Look at the output.dat file. You can see the match for the last pattern as follows:

for input vector:
0.000000  0.000000  1.000000  0.000000  0.000000  0.000000  0.000000
1.000000  0.000000  0.000000  0.000000  0.000000  1.000000  0.000000
0.000000  0.000000  0.000000  1.000000  0.000000  0.000000  0.000000
0.000000  1.000000  0.000000  0.000000  0.000000  0.000000  1.000000
0.000000  0.000000  0.000000  0.000000  1.000000  0.000000  0.000000
output vector is:
0.999637  0.998721  0.999330
expected output vector is:
1.000000  1.000000  1.000000
-----------

To see the outputs of all the patterns, we need to copy the training.dat file to the test.dat file and rerun the simulator in Test mode. Remember to delete the expected output field once you copy the file.

Running the simulator in Test mode (0) shows the following result in the output.dat file:

for input vector:
0.000000  0.000000  1.000000  0.000000  0.000000  0.000000  1.000000
0.000000  1.000000  0.000000  1.000000  0.000000  0.000000  0.000000
1.000000  1.000000  0.000000  0.000000  0.000000  1.000000  1.000000
1.000000  1.000000  1.000000  1.000000  1.000000  0.000000  0.000000
0.000000  1.000000  1.000000  0.000000  0.000000  0.000000  1.000000
output vector is:
0.005010  0.002405  0.000141
-----------
for input vector:
1.000000  0.000000  0.000000  0.000000  1.000000  0.000000  1.000000
0.000000  1.000000  0.000000  0.000000  0.000000  1.000000  0.000000
0.000000  0.000000  0.000000  1.000000  0.000000  0.000000  0.000000
0.000000  1.000000  0.000000  0.000000  0.000000  1.000000  0.000000
1.000000  0.000000  1.000000  0.000000  0.000000  0.000000  1.000000
output vector is:
0.001230  0.997844  0.000663
-----------
for input vector:
1.000000  0.000000  0.000000  0.000000  1.000000  1.000000  0.000000
0.000000  0.000000  1.000000  1.000000  0.000000  0.000000  0.000000
1.000000  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000
0.000000  0.000000  0.000000  1.000000  1.000000  0.000000  0.000000
0.000000  1.000000  1.000000  0.000000  0.000000  0.000000  1.000000
output vector is:
0.995348  0.000253  0.002677
-----------
for input vector:
1.000000  1.000000  1.000000  1.000000  1.000000  1.000000  0.000000
0.000000  0.000000  1.000000  1.000000  0.000000  0.000000  0.000000
1.000000  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000
0.000000  0.000000  0.000000  1.000000  1.000000  0.000000  0.000000
0.000000  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000
output vector is:
0.999966  0.000982  0.997594
-----------
for input vector:
0.000000  0.000000  1.000000  0.000000  0.000000  0.000000  0.000000
1.000000  0.000000  0.000000  0.000000  0.000000  1.000000  0.000000
0.000000  0.000000  0.000000  1.000000  0.000000  0.000000  0.000000
0.000000  1.000000  0.000000  0.000000  0.000000  0.000000  1.000000
0.000000  0.000000  0.000000  0.000000  1.000000  0.000000  0.000000
output vector is:
0.999637  0.998721  0.999330
-----------

The training patterns are learned very well. If a smaller tolerance is used, it would be possible to complete the learning in fewer cycles. What happens if we present a foreign character to the network? Let us create a new test.dat file with two entries for the letters M and J, as follows:

1 0 0 0 1  1 1 0 1 1  1 0 1 0 1  1 0 0 0 1  1 0 0 0 1  1 0 0 0 1
0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0  0 0 1 0 0
 
1 0 0 0 1
0 1 1 1 1

The results should show each foreign character in the category closest to it. The middle layer of the network acts as a feature detector. Since we specified five neurons, we have given the network the freedom to define five features in the input training set to use to categorize inputs. The results in the output.dat file are shown as follows.

for input vector:
1.000000  0.000000  0.000000  0.000000  1.000000  1.000000  1.000000
0.000000  1.000000  1.000000  1.000000  0.000000  1.000000  0.000000
1.000000  1.000000  0.000000  0.000000  0.000000  1.000000  1.000000
0.000000  0.000000  0.000000  1.000000  1.000000  0.000000  0.000000
0.000000  1.000000  1.000000  0.000000  0.000000  0.000000  1.000000
output vector is:
0.963513  0.000800  0.001231
-----------
for input vector:
0.000000  0.000000  1.000000  0.000000  0.000000  0.000000  0.000000
1.000000  0.000000  0.000000  0.000000  0.000000  1.000000  0.000000
0.000000  0.000000  0.000000  1.000000  0.000000  0.000000  0.000000
0.000000  1.000000  0.000000  0.000000  0.000000  0.000000  1.000000
0.000000  0.000000  0.000000  1.000000  1.000000  1.000000  1.000000
output vector is:
0.999469  0.996339  0.999157
-----------

In the first pattern, an M is categorized as an H, whereas in the second pattern, a J is categorized as an I, as expected. The case of the first pattern seems reasonable since the H and M share many pixels in common.

Other Experiments to Try

There are many other experiments you could try in order to get a better feel for how to train and use a backpropagation neural network.

  You could use the ASCII 8-bit code to represent each character, and try to train the network. You could also code all of the alphabetic characters and see if it’s possible to distinguish all of them.

  You can garble a character, to see if you still get the correct output.

  You could try changing the size of the middle layer, and see the effect on training time and generalization ability.

  You could change the tolerance setting to see the difference between an overtrained and undertrained network in generalization capability. That is, given a foreign pattern, is the network able to find the closest match and use that particular category, or does it arrive at a new category altogether?

We will return to the same example after enhancing the simulator with momentum and noise addition capability.

Adding the Momentum Term

A simple change to the training law that sometimes results in much faster training is the addition of a momentum term. The training law for backpropagation as implemented in the simulator is:

Weight change = Beta * output_error * input

Now we add a term to the weight change equation as follows:

Weight change = Beta * output_error * input +
                Alpha*previous_weight_change

The second term in this equation is the momentum term. The weight change, in the absence of error, would be a constant multiple by the previous weight change. In other words, the weight change continues in the direction it was heading. The momentum term is an attempt to try to keep the weight change process moving, and thereby not get stuck in local minimas.

Code Changes

The effected files to implement this change are the layer.cpp file, to modify the update_weights() member function of the output_layer class, and the main backprop.cpp file to read in the value for alpha and pass it to the member function. There is some additional storage needed for storing previous weight changes, and this affects the layer.h file. The momentum term could be implemented in two ways:

1.  Using the weight change for the previous pattern.

2.  Using the weight change accumulated over the previous cycle.

Although both of these implementations are valid, the second is particularly useful, since it adds a term that is significant for all patterns, and hence would contribute to global error reduction. We implement the second choice by accumulating the value of the current cycle weight changes in a vector called cum_deltas. The past cycle weight changes are stored in a vector called past_deltas. These are shown as follows in a portion of the layer.h file.

class output_layer:   public layer
{
protected:
 
       float * weights;
       float * output_errors; // array of errors at output
       float * back_errors; // array of errors back-propagated
       float * expected_values;      // to inputs
       float * cum_deltas;   // for momentum
       float * past_deltas;  // for momentum
 
  friend network;
...

Changes to the layer.cpp File

The implementation file for the layer class changes in the output_layer::update_weights() routine and the constructor and destructor for output_layer. First, here is the constructor for output_layer. Changes are highlighted in italic.

output_layer::output_layer(int ins, int outs)
{
int i, j, k;
num_inputs=ins;
num_outputs=outs;
weights = new float[num_inputs*num_outputs];
output_errors = new float[num_outputs];
back_errors = new float[num_inputs];
outputs = new float[num_outputs];
expected_values = new float[num_outputs];
cum_deltas = new float[num_inputs*num_outputs];
past_deltas = new float[num_inputs*num_outputs];
if ((weights==0)||(output_errors==0)||(back_errors==0)
       ||(outputs==0)||(expected_values==0)
       ||(past_deltas==0)||(cum_deltas==0))
       {
       cout << "not enough memory\n";
       cout << "choose a smaller architecture\n";
       exit(1);
       }
// zero cum_deltas and past_deltas matrix
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              {
              cum_deltas[k+j]=0;
              past_deltas[k+j=0;
              }
       }
}

The destructor simply deletes the new vectors:

output_layer::~output_layer(){
// some compilers may require the array
// size in the delete statement; those
// conforming to Ansi C++ will not
delete [num_outputs*num_inputs] weights;
delete [num_outputs] output_errors;
delete [num_inputs] back_errors;
delete [num_outputs] outputs;
delete [num_outputs*num_inputs] past_deltas;
delete [num_outputs*num_inputs] cum_deltas;
}

Now let’s look at the update_weights() routine changes:

void output_layer::update_weights(const float beta, const float alpha) {
int i, j, k;
float delta;
// learning law: weight_change =
//             beta*output_error*input + alpha*past_delta
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              {
              delta=beta*output_errors[j]*(*(inputs+i))
              +alpha*past_deltas[k+j];
              weights[k+j] += delta;
              cum_deltas[k+j]+=delta; // current cycle
              }
       }
}

The change to the training law amounts to calculating a delta and adding it to the cumulative total of weight changes in cum_deltas. At some point (at the start of a new cycle) you need to set the past_deltas vector to the cum_delta vector. Where does this occur? Since the layer has no concept of cycle, this must be done at the network level. There is a network level function called update_momentum at the beginning of each cycle that in turns calls a layer level function of the same name. The layer level function swaps the past_deltas vector and the cum_deltas vector, and reinitializes the cum_deltas vector to zero. We need to return to the layer.h file to see changes that are needed to define the two functions mentioned.

class output_layer:   public layer
{
protected:
 
       float * weights;
       float * output_errors; // array of errors at output
       float * back_errors; // array of errors back-propagated
       float * expected_values;     // to inputs
       float * cum_deltas;   // for momentum
       float * past_deltas;   // for momentum
 
  friend network;
public:
 
       output_layer(int, int);
       ~output_layer();
       virtual void calc_out();
       void calc_error(float &);
       void randomize_weights();
       void update_weights(const float, const float);
       void update_momentum();
       void list_weights();
       void write_weights(int, FILE *);
       void read_weights(int, FILE *);
       void list_errors();
       void list_outputs();
};
 
class network {
private:
layer *layer_ptr[MAX_LAYERS];
    int number_of_layers;
    int layer_size[MAX_LAYERS];
    float *buffer;
    fpos_t position;
    unsigned training;
public:
  network();
    ~network();
               void set_training(const unsigned &);
               unsigned get_training_value();
               void get_layer_info();
               void set_up_network();
               void randomize_weights();
               void update_weights(const float, const float);
               void update_momentum();
               ...

At both the network and output_layer class levels the function prototype for the update_momentum member functions are highlighted. The implementation for these functions are shown as follows from the layer.cpp class.

void output_layer::update_momentum()
{
// This function is called when a new cycle begins; the past_deltas
// pointer is swapped with the cum_deltas pointer. Then the contents
// pointed to by the cum_deltas pointer is zeroed out.
int i, j, k;
float * temp;
 
// swap
temp = past_deltas;
past_deltas=cum_deltas;
cum_deltas=temp;
 
// zero cum_deltas matrix
// for new cycle
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              cum_deltas[k+j]=0;
       }
}
 
void network::update_momentum()
{
int i;
 
for (i=1; i<number_of_layers; i++)
       ((output_layer *)layer_ptr[i])
               ->update_momentum();
}

Adding Noise During Training

Another approach to breaking out of local minima as well as to enhance generalization ability is to introduce some noise in the inputs during training. A random number is added to each input component of the input vector as it is applied to the network. This is scaled by an overall noise factor, NF, which has a 0 to 1 range. You can add as much noise to the simulation as you want, or not any at all, by choosing NF = 0. When you are close to a solution and have reached a satisfactory minimum, you don’t want noise at that time to interfere with convergence to the minimum. We implement a noise factor that decreases with the number of cycles, as shown in the following excerpt from the backprop.cpp file.

// update NF
// gradually reduce noise to zero
if (total_cycles>0.7*max_cycles)
                     new_NF = 0;
else if (total_cycles>0.5*max_cycles)
                     new_NF = 0.25*NF;
else if (total_cycles>0.3*max_cycles)
                     new_NF = 0.50*NF;
else if (total_cycles>0.1*max_cycles)
                     new_NF = 0.75*NF;
 
backp.set_NF(new_NF);

The noise factor is reduced at regular intervals. The new noise factor is updated with the network class function called set_NF(float). There is a member variable in the network class called NF that holds the current value for the noise factor. The noise is added to the inputs in the input_layer member function calc_out().

Another reason for using noise is to prevent memorization by the network. You are effectively presenting a different input pattern with each cycle so it becomes hard for the network to memorize patterns.

One Other Change—Starting Training from a Saved Weight File

Shortly, we will look at the complete listings for the backpropagation simulator. There is one other enhancement to discuss. It is often useful in long simulations to be able to start from a known point, which is from an already saved set of weights. This is a simple change in the backprop.cpp program, which is well worth the effort. As a side benefit, this feature will allow you to run a simulation with a large beta value for, say, 500 cycles, save the weights, and then start a new simulation with a smaller beta value for another 500 or more cycles. You can take preset breaks in long simulations, which you will encounter in Chapter 14. At this point, let’s look at the complete listings for the updated layer.h and layer.cpp files in Listings 13.1 and 13.2:

layer.h file updated to include noise and momentum

// layer.h            V.Rao, H. Rao
// header file for the layer class hierarchy and
// the network class
 // added noise and momentum
 
#define MAX_LAYERS    5
#define MAX_VECTORS   100
 
class network;
class Kohonen_network;
 
class layer{
protected:
       int num_inputs;
       int num_outputs;
       float *outputs; // pointer to array of outputs
       float *inputs;  // pointer to array of inputs, which are outputs of some other layer
       friend network;
       friend Kohonen_network;  // update for Kohonen model
 
public:
 
       virtual void calc_out()=0;
};
 
class input_layer: public layer{
private:
float noise_factor;
float * orig_outputs;
 
public:
       input_layer(int, int);
       ~input_layer();
       virtual void calc_out();
       void set_NF(float);
       friend network;
};
 
class middle_layer;
 
class output_layer:  public layer {
protected:
       float * weights;
       float * output_errors; // array of errors at output
       float * back_errors;   // array of errors back-propagated
       float * expected_values;        // to inputs
       float * cum_deltas;    // for momentum
       float * past_deltas;   // for momentum
       friend network;
 
public:
       output_layer(int, int);
       ~output_layer();
       virtual void calc_out();
       void calc_error(float &);
       void randomize_weights();
       void update_weights(const float, const float);
       void update_momentum();
       void list_weights();
       void write_weights(int, FILE *);
       void read_weights(int, FILE *);
       void list_errors();
       void list_outputs();
};
 
class middle_layer:   public output_layer
{
 
private:
 
public:
    middle_layer(int, int);
    ~middle_layer();
       void calc_error();
};
 
class network
 
{
 
private:
 
layer *layer_ptr[MAX_LAYERS];
    int number_of_layers;
    int layer_size[MAX_LAYERS];
    float *buffer;
    fpos_t position;
    unsigned training;
 
public:
    network();
    ~network();
               void set_training(const unsigned &);
               unsigned get_training_value();
               void get_layer_info();
               void set_up_network();
               void randomize_weights();
               void update_weights(const float, const float);
               void update_momentum();
               void write_weights(FILE *);
               void read_weights(FILE *);
               void list_weights();
               void write_outputs(FILE *);
               void list_outputs();
               void list_errors();
               void forward_prop();
               void backward_prop(float &);
               int fill_IObuffer(FILE *);
               void set_up_pattern(int);
               void set_NF(float);
 
};

Layer.cpp file updated to include noise and momentum

// layer.cpp          V.Rao, H.Rao
// added momentum and noise
 
// compile for floating point hardware if available
#include <stdio.h>
#include <iostream.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include "layer.h"
 
inline float squash(float input)
// squashing function
// use sigmoid — can customize to something
// else if desired; can add a bias term too
//
{
if (input < -50)
       return 0.0;
else   if (input > 50)
              return 1.0;
       else return (float)(1/(1+exp(-(double)input)));
 
}
 
inline float randomweight(unsigned init)
{
int num;
// random number generator
// will return a floating point
// value between -1 and 1
 
if (init==1)  // seed the generator
       srand ((unsigned)time(NULL));
 
num=rand() % 100;
 
return 2*(float(num/100.00))-1;
}
 
// the next function is needed for Turbo C++
// and Borland C++ to link in the appropriate
// functions for fscanf floating point formats:
static void force_fpf()
{
       float x, *y;
       y=&x;
       x=*y;
}
 
// ---------------------
//                            input layer
//---------------------
input_layer::input_layer(int i, int o)
{
 
num_inputs=i;
num_outputs=o;
 
outputs = new float[num_outputs];
orig_outputs = new float[num_outputs];
if ((outputs==0)||(orig_outputs==0))
        {
        cout << "not enough memory\n";
        cout << "choose a smaller architecture\n";
        exit(1);
        }
 
noise_factor=0;
 
}
 
input_layer::~input_layer()
{
delete [num_outputs] outputs;
delete [num_outputs] orig_outputs;
}
 
void input_layer::calc_out()
{
//add noise to inputs
// randomweight returns a random number
// between -1 and 1
 
int i;
for (i=0; i<num_outputs; i++)
       outputs[i] =orig_outputs[i]*
              (1+noise_factor*randomweight(0));
 
}
 
void input_layer::set_NF(float noise_fact)
{
noise_factor=noise_fact;
}
 
// ---------------------
//                            output layer
//---------------------
 
output_layer::output_layer(int ins, int outs)
{
int i, j, k;
num_inputs=ins;
num_outputs=outs;
weights = new float[num_inputs*num_outputs];
output_errors = new float[num_outputs];
back_errors = new float[num_inputs];
outputs = new float[num_outputs];
expected_values = new float[num_outputs];
cum_deltas = new float[num_inputs*num_outputs];
past_deltas = new float[num_inputs*num_outputs];
if ((weights==0)||(output_errors==0)||(back_errors==0)
       ||(outputs==0)||(expected_values==0)
       ||(past_deltas==0)||(cum_deltas==0))
       {
       cout << "not enough memory\n";
       cout << "choose a smaller architecture\n";
       exit(1);
       }
 
// zero cum_deltas and past_deltas matrix
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              {
              cum_deltas[k+j]=0;
              past_deltas[k+j]=0;
              }
       }
}
 
output_layer::~output_layer()
{
// some compilers may require the array
// size in the delete statement; those
// conforming to Ansi C++ will not
delete [num_outputs*num_inputs] weights;
delete [num_outputs] output_errors;
delete [num_inputs] back_errors;
delete [num_outputs] outputs;
delete [num_outputs*num_inputs] past_deltas;
delete [num_outputs*num_inputs] cum_deltas;
 
}
 
void output_layer::calc_out()
{
int i,j,k;
float accumulator=0.0;
 
for (j=0; j<num_outputs; j++)
       {
 
       for (i=0; i<num_inputs; i++)
 
              {
              k=i*num_outputs;
              if (weights[k+j]*weights[k+j] > 1000000.0)
                     {
                      cout << "weights are blowing up\n";
                      cout << "try a smaller learning constant\n";
                      cout << "e.g. beta=0.02    aborting...\n";
                      exit(1);
                     }
              outputs[j]=weights[k+j]*(*(inputs+i));
              accumulator+=outputs[j];
              }
       // use the sigmoid squash function
       outputs[j]=squash(accumulator);
       accumulator=0;
       }
 
}
 
 
void output_layer::calc_error(float & error)
{
int i, j, k;
float accumulator=0;
float total_error=0;
for (j=0; j<num_outputs; j++)
    {
                  output_errors[j] = expected_values[j]-outputs[j];
                  total_error+=output_errors[j];
                  }
 
error=total_error;
 
for (i=0; i<num_inputs; i++)
{
k=i*num_outputs;
for (j=0; j<num_outputs; j++)
       {
               back_errors[i]=
                      weights[k+j]*output_errors[j];
               accumulator+=back_errors[i];
               }
       back_errors[i]=accumulator;
       accumulator=0;
       // now multiply by derivative of
       // sigmoid squashing function, which is
       // just the input*(1-input)
       back_errors[i]*=(*(inputs+i))*(1-(*(inputs+i)));
       }
 
}
 
void output_layer::randomize_weights()
{
int i, j, k;
const unsigned first_time=1;
 
const unsigned not_first_time=0;
float discard;
 
discard=randomweight(first_time);
 
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              weights[k+j]=randomweight(not_first_time);
       }
}
 
void output_layer::update_weights(const float beta,
                                     const float alpha)
{
int i, j, k;
float delta;
 
// learning law: weight_change =
//             beta*output_error*input + alpha*past_delta
 
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              {
              delta=beta*output_errors[j]*(*(inputs+i))
                     +alpha*past_deltas[k+j];
              weights[k+j] += delta;
              cum_deltas[k+j]+=delta; // current cycle
              }
 
       }
 
}
 
void output_layer::update_momentum()
{
// This function is called when a
// new cycle begins; the past_deltas
// pointer is swapped with the
// cum_deltas pointer. Then the contents
// pointed to by the cum_deltas pointer
// is zeroed out.
int i, j, k;
float * temp;
 
// swap
temp = past_deltas;
past_deltas=cum_deltas;
cum_deltas=temp;
 
// zero cum_deltas matrix
// for new cycle
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              cum_deltas[k+j]=0;
       }
}
 
void output_layer::list_weights()
{
int i, j, k;
 
for (i=0; i< num_inputs; i++)
       {
       k=i*num_outputs;
       for (j=0; j< num_outputs; j++)
              cout << "weight["<<i<<","<<
                         j<<"] is: "<<weights[k+j];
       }
 
}
 
void output_layer::list_errors()
{
int i, j;
 
for (i=0; i< num_inputs; i++)
       cout << "backerror["<<i<<
                  "] is : "<<back_errors[i]<<"\n";
for (j=0; j< num_outputs; j++)
       cout << "outputerrors["<<j<<
                            "] is: "<<output_errors[j]<<"\n";
 
}
 
void output_layer::write_weights(int layer_no,
               FILE * weights_file_ptr)
{
int i, j, k;
 
// assume file is already open and ready for
// writing
 
// prepend the layer_no to all lines of data
// format:
//             layer_no   weight[0,0] weight[0,1] ...
//             layer_no   weight[1,0] weight[1,1] ...
//             ...
 
for (i=0; i< num_inputs; i++)
       {
       fprintf(weights_file_ptr,"%i ",layer_no);
       k=i*num_outputs;
    for (j=0; j< num_outputs; j++)
       {
       fprintf(weights_file_ptr,"%f ",
                      weights[k+j]);
       }
    fprintf(weights_file_ptr,"\n");
    }
 
}
 
void output_layer::read_weights(int layer_no,
               FILE * weights_file_ptr)
{
int i, j, k;
 
// assume file is already open and ready for
// reading
 
// look for the prepended layer_no
// format:
//             layer_no       weight[0,0] weight[0,1] ...
//             layer_no       weight[1,0] weight[1,1] ...
//             ...
while (1)
 
        {
 
        fscanf(weights_file_ptr,"%i";,&j);
        if ((j==layer_no)|| (feof(weights_file_ptr)))
               break;
        else
               {
               while (fgetc(weights_file_ptr) != `\n')
                      {;}// get rest of line
               }
        }
 
if (!(feof(weights_file_ptr)))
        {
        // continue getting first line
        i=0;
        for (j=0; j< num_outputs; j++)
                          {
 
                          fscanf(weights_file_ptr,"%f",
                                         &weights[j]); // i*num_outputs
                                                          = 0
     }
        fscanf(weights_file_ptr,"\n");
 
        // now get the other lines
        for (i=1; i< num_inputs; i++)
               {
               fscanf(weights_file_ptr,
               ”%i”,&layer_no);
        k=i*num_outputs;
        for (j=0; j< num_outputs; j++)
        {
        fscanf(weights_file_ptr,”%f”,
               &weights[k+j]);
               }
 
    }
    fscanf(weights_file_ptr,”\n”);
    }
 
else cout << “end of file reached\n”;
 
}
 
void output_layer::list_outputs()
{
int j;
 
for (j=0; j< num_outputs; j++)
        {
        cout << “outputs[“<<j
               <<”] is: “<<outputs[j]<<”\n”;
        }
 
}
 
// ————————————————————-
//                           middle layer
//—————————————————————
 
middle_layer::middle_layer(int i, int o):
        output_layer(i,o)
{
}
 
middle_layer::~middle_layer()
{
delete [num_outputs*num_inputs] weights;
delete [num_outputs] output_errors;
delete [num_inputs] back_errors;
delete [num_outputs] outputs;
}
 
void middle_layer::calc_error()
{
int i, j, k;
float accumulator=0;
 
for (i=0; i<num_inputs; i++)
        {
        k=i*num_outputs;
        for (j=0; j<num_outputs; j++)
               {
               back_errors[i]=
                      weights[k+j]*(*(output_errors+j));
               accumulator+=back_errors[i];
               }
        back_errors[i]=accumulator;
        accumulator=0;
        // now multiply by derivative of
        // sigmoid squashing function, which is
        // just the input*(1-input)
        back_errors[i]*=(*(inputs+i))*(1-(*(inputs+i)));
        }
 
}
 
network::network()
{
position=0L;
}
 
network::~network()
{
int i,j,k;
i=layer_ptr[0]->num_outputs;// inputs
j=layer_ptr[number_of_layers-1]->num_outputs; //outputs
k=MAX_VECTORS;
 
delete [(i+j)*k]buffer;
}
 
void network::set_training(const unsigned & value)
{
training=value;
}
 
unsigned network::get_training_value()
{
return training;
}
 
void network::get_layer_info()
{
int i;
 
//—————————————————————
//
//      Get layer sizes for the network
//
// ————————————————————-
cout << “ Please enter in the number of layers for your net work.\n”;
cout << “ You can have a minimum of 3 to a maximum of 5. \n”;
cout << “ 3 implies 1 hidden layer; 5 implies 3 hidden layers : \n\n”;
cin >> number_of_layers;
cout << “ Enter in the layer sizes separated by spaces.\n”;
cout << “ For a network with 3 neurons in the input layer,\n”;
cout << “ 2 neurons in a hidden layer, and 4 neurons in the\n”;
cout << “ output layer, you would enter: 3 2 4 .\n”;
cout << “ You can have up to 3 hidden layers,for five maximum entries:\n\n”;
for (i=0; i<number_of_layers; i++)
        {
        cin >> layer_size[i];
        }
// ———————————————————————————
// size of layers:
//    input_layer            layer_size[0]
//    output_layer           layer_size[number_of_layers-1]
//    middle_layers          layer_size[1]
//    optional: layer_size[number_of_layers-3]
//    optional: layer_size[number_of_layers-2]
//———————————————————————————-
}
void network::set_up_network()
{
int i,j,k;
//———————————————————————————-
// Construct the layers
//
//———————————————————————————-
layer_ptr[0] = new input_layer(0,layer_size[0]);
for (i=0;i<(number_of_layers-1);i++)
        {
        layer_ptr[i+1] =
        new middle_layer(layer_size[i],layer_size[i+1]);
        }
 
layer_ptr[number_of_layers-1] = new
output_layer(layer_size[number_of_layers-2], layer_size[number_of_
layers-1]);
 
for (i=0;i<(number_of_layers-1);i++)
        {
        if (layer_ptr[i] == 0)
               {
               cout << “insufficient memory\n”;
               cout << “use a smaller architecture\n”;
               exit(1);
               }
        }
//———————————————————————————-
// Connect the layers
//
//———————————————————————————-
// set inputs to previous layer outputs for all layers,
//             except the input layer
 
for (i=1; i< number_of_layers; i++)
        layer_ptr[i]->inputs = layer_ptr[i-1]->outputs;
 
// for back_propagation, set output_errors to next layer
//             back_errors for all layers except the output
//             layer and input layer
 
for (i=1; i< number_of_layers -1; i++)
        ((output_layer *)layer_ptr[i])->output_errors =
               ((output_layer *)layer_ptr[i+1])->back_errors;
 
// define the IObuffer that caches data from
// the datafile
i=layer_ptr[0]->num_outputs;// inputs
j=layer_ptr[number_of_layers-1]->num_outputs; //outputs
k=MAX_VECTORS;
 
buffer=new
        float[(i+j)*k];
if (buffer==0)
        {
        cout << “insufficient memory for buffer\n”;
        exit(1);
        }
}
 
void network::randomize_weights()
{
int i;
 
for (i=1; i<number_of_layers; i++)
        ((output_layer *)layer_ptr[i])
                ->randomize_weights();
}
 
void network::update_weights(const float beta, const float alpha)
{
int i;
 
for (i=1; i<number_of_layers; i++)
        ((output_layer *)layer_ptr[i])
               ->update_weights(beta,alpha);
}
 
void network::update_momentum()
{
int i;
for (i=1; i<number_of_layers; i++)
        ((output_layer *)layer_ptr[i])
               ->update_momentum();
}
 
void network::write_weights(FILE * weights_file_ptr)
{
int i;
 
for (i=1; i<number_of_layers; i++)
        ((output_layer *)layer_ptr[i])
               ->write_weights(i,weights_file_ptr);
}
 
void network::read_weights(FILE * weights_file_ptr)
{
int i;
 
for (i=1; i<number_of_layers; i++)
        ((output_layer *)layer_ptr[i])
               ->read_weights(i,weights_file_ptr);
}
 
void network::list_weights()
{
int i;
 
for (i=1; i<number_of_layers; i++)
        {
        cout << “layer number : “ <<i<< “\n”;
        ((output_layer *)layer_ptr[i])
               ->list_weights();
        }
}
 
void network::list_outputs()
{
int i;
 
for (i=1; i<number_of_layers; i++)
        {
        cout << “layer number : “ <<i<< “\n”;
        ((output_layer *)layer_ptr[i])
               ->list_outputs();
        }
}
 
void network::write_outputs(FILE *outfile)
{
int i, ins, outs;
ins=layer_ptr[0]->num_outputs;
outs=layer_ptr[number_of_layers-1]->num_outputs;
float temp;
 
fprintf(outfile,”for input vector:\n”);
 
for (i=0; i<ins; i++)
        {
        temp=layer_ptr[0]->outputs[i];
        fprintf(outfile,”%f  “,temp);
        }
 
fprintf(outfile,”\noutput vector is:\n”);
 
for (i=0; i<outs; i++)
        {
        temp=layer_ptr[number_of_layers-1]->
        outputs[i];
        fprintf(outfile,”%f  “,temp);
 
        }
 
if (training==1)
{
fprintf(outfile,”\nexpected output vector is:\n”);
 
for (i=0; i<outs; i++)
        {
        temp=((output_layer *)(layer_ptr[number_of_layers-1]))->
        expected_values[i];
        fprintf(outfile,”%f  “,temp);
 
        }
}
 
fprintf(outfile,”\n———————————\n”);
 
}
 
void network::list_errors()
{
int i;
 
for (i=1; i<number_of_layers; i++)
        {
        cout << “layer number : “ <<i<< “\n”;
        ((output_layer *)layer_ptr[i])
               ->list_errors();
        }
}
 
int network::fill_IObuffer(FILE * inputfile)
{
// this routine fills memory with
// an array of input, output vectors
// up to a maximum capacity of
// MAX_INPUT_VECTORS_IN_ARRAY
// the return value is the number of read
// vectors
 
int i, k, count, veclength;
 
int ins, outs;
 
ins=layer_ptr[0]->num_outputs;
 
outs=layer_ptr[number_of_layers-1]->num_outputs;
 
if (training==1)
        veclength=ins+outs;
else
        veclength=ins;
 
count=0;
while  ((count<MAX_VECTORS)&&
               (!feof(inputfile)))
        {
        k=count*(veclength);
        for (i=0; i<veclength; i++)
               {
               fscanf(inputfile,”%f”,&buffer[k+i]);
               }
        fscanf(inputfile,”\n”);
        count++;
        }
 
if (!(ferror(inputfile)))
        return count;
else return -1; // error condition
 
}
 
void network::set_up_pattern(int buffer_index)
{
// read one vector into the network
int i, k;
int ins, outs;
 
ins=layer_ptr[0]->num_outputs;
outs=layer_ptr[number_of_layers-1]->num_outputs;
if (training==1)
        k=buffer_index*(ins+outs);
else
        k=buffer_index*ins;
 
for (i=0; i<ins; i++)
        ((input_layer*)layer_ptr[0])
                      ->orig_outputs[i]=buffer[k+i];
if (training==1)
{
        for (i=0; i<outs; i++)
               ((output_layer *)layer_ptr[number_of_layers-1])->
                      expected_values[i]=buffer[k+i+ins];
}
 
}
 
void network::forward_prop()
{
int i;
for (i=0; i<number_of_layers; i++)
        {
        layer_ptr[i]->calc_out(); //polymorphic
                               // function
        }
}
 
void network::backward_prop(float & toterror)
{
int i;
// error for the output layer
((output_layer*)layer_ptr[number_of_layers-1])->
                      calc_error(toterror);
 
// error for the middle layer(s)
for (i=number_of_layers-2; i>0; i—)
        {
        ((middle_layer*)layer_ptr[i])->
                      calc_error();
 
        }
 
}
 
void network::set_NF(float noise_fact)
{
((input_layer*)layer_ptr[0])->set_NF(noise_fact);
}

The New and Final backprop.cpp File

The last file to present is the backprop.cpp file. This is shown in listing:

Implementation file for the backpropagation simulator, with noise and momentum backprop.cpp

// backprop.cpp      V. Rao, H. Rao
#include “layer.cpp”
 
#define TRAINING_FILE   “training.dat”
#define WEIGHTS_FILE “weights.dat”
#define OUTPUT_FILE   “output.dat”
#define TEST_FILE   “test.dat”
 
void main()
{
 
float error_tolerance=0.1;
float total_error=0.0;
float avg_error_per_cycle=0.0;
float error_last_cycle=0.0;
float avgerr_per_pattern=0.0; // for the latest cycle
float error_last_pattern=0.0;
float learning_parameter=0.02;
float alpha; // momentum parameter
float NF; // noise factor
float new_NF;
 
unsigned temp, startup, start_weights;
long int vectors_in_buffer;
long int max_cycles;
long int patterns_per_cycle=0;
 
long int total_cycles, total_patterns;
int i;
 
// create a network object
network backp;
 
FILE * training_file_ptr, * weights_file_ptr, * output_file_ptr;
FILE * test_file_ptr, * data_file_ptr;
 
// open output file for writing
if ((output_file_ptr=fopen(OUTPUT_FILE,”w”))==NULL)
               {
               cout << “problem opening output file\n”;
               exit(1);
               }
 
// enter the training mode : 1=training on     0=training off
cout << “—————————————————————————-\n”;
cout << “ C++ Neural Networks and Fuzzy Logic \n”;
cout << “    Backpropagation simulator \n”;
cout << “      version 2 \n”;
cout << “—————————————————————————-\n”;
cout << “Please enter 1 for TRAINING on, or 0 for off: \n\n”;
cout << “Use training to change weights according to your\n”;
cout << “expected outputs. Your training.dat file should contain\n”;
cout << “a set of inputs and expected outputs. The number of\n”;
cout << “inputs determines the size of the first (input) layer\n”;
cout << “while the number of outputs determines the size of the\n”;
cout << “last (output) layer :\n\n”;
 
cin >> temp;
backp.set_training(temp);
 
if (backp.get_training_value() == 1)
        {
        cout << “—> Training mode is *ON*. weights will be saved\n”;
        cout << “in the file weights.dat at the end of the\n”;
        cout << “current set of input (training) data\n”;
        }
else
        {
        cout << “—> Training mode is *OFF*. weights will be loaded\n”;
        cout << “from the file weights.dat and the current\n”;
        cout << “(test) data set will be used. For the test\n”;
        cout << “data set, the test.dat file should contain\n”;
        cout << “only inputs, and no expected outputs.\n”;
        }
 
if (backp.get_training_value()==1)
        {
        // ————————————————————-
        //    Read in values for the error_tolerance,
        //    and the learning_parameter
        // ————————————————————-
        cout << “ Please enter in the error_tolerance\n”;
        cout << “ —- between 0.001 to 100.0, try 0.1 to start - \n”;
        cout << “\n”;
        cout << “and the learning_parameter, beta\n”;
        cout << “ —- between 0.01 to 1.0, try 0.5 to start - \n\n”;
        cout << “ separate entries by a space\n”;
        cout << “ example: 0.1 0.5 sets defaults mentioned :\n\n”;
        cin >> error_tolerance >> learning_parameter;
 
        // ————————————————————-
        //    Read in values for the momentum
        //    parameter, alpha (0-1.0)
        //    and the noise factor, NF (0-1.0)
        // ————————————————————-
        cout << “Enter values now for the momentum \n”;
        cout << “parameter, alpha(0-1.0)\n”;
        cout << “ and the noise factor, NF (0-1.0)\n”;
        cout << “You may enter zero for either of these\n”;
        cout << “parameters, to turn off the momentum or\n”;
        cout << “noise features.\n”;
        cout << “If the noise feature is used, a random\n”;
        cout << “component of noise is added to the inputs\n”;
        cout << “This is decreased to 0 over the maximum\n”;
        cout << “number of cycles specified.\n”;
        cout << “enter alpha followed by NF, e.g., 0.3 0.5\n”;
 
        cin >> alpha >> NF;
 
        //—————————————————————-
        // open training file for reading
        //—————————————————————-
        if ((training_file_ptr=fopen(TRAINING_FILE,”r”))==NULL)
               {
               cout << “problem opening training file\n”;
               exit(1);
               }
        data_file_ptr=training_file_ptr; // training on
 
        // Read in the maximum number of cycles
        // each pass through the input data file is a cycle
        cout << “Please enter the maximum cycles for the simulation\n”;
        cout << “A cycle is one pass through the data set.\n”;
        cout << “Try a value of 10 to start with\n”;
 
        cin >> max_cycles;
        cout << “Do you want to read weights from weights.dat to
start?\n”;
        cout << “Type 1 to read from file, 0 to randomize starting
weights\n”;
        cin >> start_weights;
 
        }
else
        {
        if ((test_file_ptr=fopen(TEST_FILE,”r”))==NULL)
               {
               cout << “problem opening test file\n”;
               exit(1);
               }
 
        data_file_ptr=test_file_ptr; // training off
        }
 
// training: continue looping until the total error is less than
//             the tolerance specified, or the maximum number of
//             cycles is exceeded; use both the forward signal
               propagation
//             and the backward error propagation phases. If the error
//             tolerance criteria is satisfied, save the weights in a
               file.
// no training: just proceed through the input data set once in the
//             forward signal propagation phase only. Read the starting
//             weights from a file.
// in both cases report the outputs on the screen
 
// initialize counters
total_cycles=0; // a cycle is once through all the input data
total_patterns=0; // a pattern is one entry in the input data
new_NF=NF;
 
// get layer information
backp.get_layer_info();
 
// set up the network connections
backp.set_up_network();
 
// initialize the weights
if ((backp.get_training_value()==1)&&(start_weights!=1))
        {
        // randomize weights for all layers; there is no
        // weight matrix associated with the input layer
        // weight file will be written after processing
 
        backp.randomize_weights();
        // set up the noise factor value
        backp.set_NF(new_NF);
        }
else
        {
        // read in the weight matrix defined by a
        // prior run of the backpropagation simulator
        // with training on
        if ((weights_file_ptr=fopen(WEIGHTS_FILE,”r”))
                       ==NULL)
               {
               cout << “problem opening weights file\n”;
               exit(1);
               }
        backp.read_weights(weights_file_ptr);
        fclose(weights_file_ptr);
        }
 
// main loop
// if training is on, keep going through the input data
//             until the error is acceptable or the maximum number of
               cycles
//             is exceeded.
// if training is off, go through the input data once. report outputs
// with inputs to file output.dat
 
startup=1;
vectors_in_buffer = MAX_VECTORS; // startup condition
total_error = 0;
 
while (   ((backp.get_training_value()==1)
                         && (avgerr_per_pattern
                                      > error_tolerance)
                         && (total_cycles < max_cycles)
                         && (vectors_in_buffer !=0))
                         || ((backp.get_training_value()==0)
                         && (total_cycles < 1))
                         || ((backp.get_training_value()==1)
                         && (startup==1))
                         )
{
startup=0;
error_last_cycle=0; // reset for each cycle
patterns_per_cycle=0;
 
backp.update_momentum(); // added to reset
                       // momentum matrices
                       // each cycle
 
// process all the vectors in the datafile
// going through one buffer at a time
// pattern by pattern
 
while ((vectors_in_buffer==MAX_VECTORS))
        {
 
        vectors_in_buffer=
               backp.fill_IObuffer(data_file_ptr); // fill buffer
               if (vectors_in_buffer < 0)
                      {
                      cout << “error in reading in vectors, aborting\n”;
                      cout << “check that there are no extra linefeeds\n”;
                      cout << “in your data file, and that the number\n”;
                      cout << “of layers and size of layers match the\n”;
                      cout << “the parameters provided.\n”;
                      exit(1);
                      }
 
               // process vectors
               for (i=0; i<vectors_in_buffer; i++)
                      {
                      // get next pattern
                      backp.set_up_pattern(i);
 
                      total_patterns++;
                      patterns_per_cycle++;
                      // forward propagate
 
                      backp.forward_prop();
 
                      if (backp.get_training_value()==0)
                             backp.write_outputs(output_file_ptr);
 
                      // back_propagate, if appropriate
                      if (backp.get_training_value()==1)
                             {
 
                             backp.backward_prop(error_last_pattern);
                             error_last_cycle +=
                                    error_last_pattern*error_last_pattern;
 
                             avgerr_per_pattern=
               ((float)sqrt((double)error_last_cycle/patterns_per_cycle));
                             // if it’s not the last cycle, update weights
                           if ((avgerr_per_pattern
                                    > error_tolerance)
                                    && (total_cycles+1 < max_cycles))
 
                                    backp.update_weights(learning_
                                           parameter, alpha);
                             // backp.list_weights(); // can
                             // see change in weights by
                             // using list_weights before and
                             // after back_propagation
                             }
 
                      }
       error_last_pattern = 0;
       }
 
total_error += error_last_cycle;
total_cycles++;
 
// update NF
// gradually reduce noise to zero
if (total_cycles>0.7*max_cycles)
               new_NF = 0;
else   if (total_cycles>0.5*max_cycles)
                      new_NF = 0.25*NF;
               else   if (total_cycles>0.3*max_cycles)
                                    new_NF = 0.50*NF;
                             else   if (total_cycles>0.1*max_cycles)
                                           new_NF = 0.75*NF;
 
backp.set_NF(new_NF);
 
// most character displays are 25 lines
// user will see a corner display of the cycle count
// as it changes
 
cout << “\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n”;
cout << total_cycles << “\t” << avgerr_per_pattern << “\n”;
 
fseek(data_file_ptr, 0L, SEEK_SET); // reset the file pointer
                             // to the beginning of
                             // the file
vectors_in_buffer = MAX_VECTORS; // reset
 
} // end main loop
 
if (backp.get_training_value()==1)
        {
        if ((weights_file_ptr=fopen(WEIGHTS_FILE,”w”))
                      ==NULL)
               {
               cout << “problem opening weights file\n”;
               exit(1);
               }
        }
 
cout << “\n\n\n\n\n\n\n\n\n\n\n”;
cout << “————————————————————————\n”;
cout << “    done:   results in file output.dat\n”;
cout << “            training: last vector only\n”;
cout << “            not training: full cycle\n\n”;
if (backp.get_training_value()==1)
        {
        backp.write_weights(weights_file_ptr);
        backp.write_outputs(output_file_ptr);
        avg_error_per_cycle=(float)sqrt((double)total_error/
        total_cycles);
        error_last_cycle=(float)sqrt((double)error_last_cycle);
        fclose(weights_file_ptr);
 
cout << “              weights saved in file
weights.dat\n”;
cout << “\n”;
cout << “——>average error per cycle =
“ << avg_error_per_cycle << “
<—-\n”;
cout << “——>error last cycle = “
<< error_last_cycle << “ <—-\n”;
???cout << “->error last cycle per
pattern=“<<avgerr_per_pattern
<<“<—-\n”;
 
        }
 
cout << “——————>total
cycles = “ << total_cycles << “
<—-\n”;
cout << “——————>total
patterns = “ << total_patterns
<< “ <—-\n”;
cout <<
“——————————
;——————————————\n”;
// close all files
fclose(data_file_ptr);
fclose(output_file_ptr);
 
}

Trying the Noise and Momentum Features

You can test out the version 2 simulator, which you just compiled with the example that you saw at the beginning of the chapter. You will find that there is a lot of trial and error in finding optimum values for alpha, the noise factor, and beta. This is true also for the middle layer size and the number of middle layers. For some problems, the addition of momentum makes convergence much faster. For other problems, you may not find any noticeable difference. An example run of the five-character recognition problem discussed at the beginning of this chapter resulted in the following results with beta = 0.1, tolerance = 0.001, alpha = 0.25, NF = 0.1, and the layer sizes kept at 35 5 3.

—————————————————————————-
        done:   results in file output.dat
                training: last vector only
                not training: full cycle
 
                weights saved in file weights.dat
 
——>average error per cycle = 0.02993<—-
——>error last cycle = 0.00498<—-
->error last cycle per pattern= 0.000996 <—-
——————>total cycles = 242 <—-
——————>total patterns = 1210 <—-
—————————————————————————-

The network was able to converge on a better solution (in terms of error measurement) in one-fourth the number of cycles. You can try varying alpha and NF to see the effect on overall simulation time. You can now start from the same initial starting weights by specifying a value of 1 for the starting weights question. For large values of alpha and beta, the network usually will not converge, and the weights will get unacceptably large (you will receive a message to that effect).

Variations of the Backpropagation Algorithm

Backpropagation is a versatile neural network algorithm that very often leads to success. Its Achilles heel is the slowness at which it converges for certain problems. Many variations of the algorithm exist in the literature to try to improve convergence speed and robustness. Variations have been proposed in the following portions of the algorithm:

  Adaptive parameters. You can set rules that modify alpha, the momentum parameter, and beta, the learning parameter, as the simulation progresses. For example, you can reduce beta whenever a weight change does not reduce the error. You can consider undoing the particular weight change, setting alpha to zero and redoing the weight change with the new value of beta.

  Use other minimum search routines besides steepest descent. For example, you could use Newton’s method for finding a minimum, although this would be a fairly slow process. Other examples include the use of conjugate gradient methods or Levenberg-Marquardt optimization, both of which would result in very rapid training.

  Use different cost functions. Instead of calculating the error (as expected—actual output), you could determine another cost function that you want to minimize.

  Modify the architecture. You could use partially connected layers instead of fully connected layers. Also, you can use a recurrent network, that is, one in which some outputs feed back as inputs.

Applications

Backpropagation remains the king of neural network architectures because of its ease of use and wide applicability. A few of the notable applications in the literature will be cited as examples.

  NETTalk. In 1987, Sejnowski and Rosenberg developed a network connected to a speech synthesizer that was able to utter English words, being trained to produce phonemes from English text. The architecture consisted of an input layer window of seven characters. The characters were part of English text that was scrolled by. The network was trained to pronounce the letter at the center of the window. The middle layer had 80 neurons, while the output layer consisted of 26 neurons. With 1024 training patterns and 10 cycles, the network started making intelligible speech, similar to the process of a child learning to talk. After 50 cycles, the network was about 95% accurate. You could purposely damage the network with the removal of neurons, but this did not cause performance to drop off a cliff; instead, the performance degraded gracefully. There was rapid recovery with retraining using fewer neurons also. This shows the fault tolerance of neural networks.

  Sonar target recognition. Neural nets using backpropagation have been used to identify different types of targets using the frequency signature (with a Fast Fourier transform) of the reflected signal.

  Car navigation. Pomerleau developed a neural network that is able to navigate a car based on images obtained from a camera mounted on the car’s roof, and a range finder that coded distances in grayscale. The 30×32 pixel image and the 8×32 range finder image were fed into a hidden layer of size 29 feeding an output layer of 45 neurons. The output neurons were arranged in a straight line with each side representing a turn to a particular direction (right or left), while the center neurons represented “drive straight ahead.” After 1200 road images were trained on the network, the neural network driver was able to negotiate a part of the Carnegie-Mellon campus at a speed of about 3 miles per hour, limited only by the speed of the real-time calculations done on a trained network in the Sun-3 computer in the car.

  Image compression. G.W. Cottrell, P. Munro, and D. Zipser used backpropagation to compress images with the result of an 8:1 compression ratio. They used standard backpropagation with 64 input neurons (8×8 pixels), 16 hidden neurons, and 64 output neurons equal to the inputs. This is called self-supervised backpropagation and represents an autoassociative network. The compressed signal is taken from the hidden layer. The input to hidden layer comprised the compressor, while the hidden to output layer forms a decompressor.

  Image recognition. Le Cun reported a backpropagation network with three hidden layers that could recognize handwritten postal zip codes. He used a 16×16 array of pixel to represent each handwritten digit and needed to encode 10 outputs, each of which represented a digit from 0 to 9. One interesting aspect of this work is that the hidden layers were not fully connected. The network was set up with blocks of neurons in the first two hidden layers set up as feature detectors for different parts of the previous layer. All the neurons in the block were set up to have the same weights as those from the previous layer. This is called weight sharing. Each block would sample a different part of the previous layer’s image. The first hidden layer had 12 blocks of 8×8 neurons, whereas the second hidden layer had 12 blocks of 4×4 neurons. The third hidden layer was fully connected and consisted of 30 neurons. There were 1256 neurons. The network was trained on 7300 examples and tested on 2000 cases with error rates of 1% on training set and 5% on the test set.

Summary

You explored further the backpropagation algorithm in this chapter, continuing the discussion in Chapter 7.

  A momentum term was added to the training law and was shown to result in much faster convergence in some cases.

  A noise term was added to inputs to allow training to take place with random noise applied. This noise was made to decrease with the number of cycles, so that final stage learning could be done in a noise-free environment.

  The final version of the backpropagation simulator was constructed and used on the example from Chapter 12. Further application of the simulator will be made in Chapter 14.

  Several applications with the backpropagation algorithm were outlined, showing the wide applicability of this algorithm.