Application to Financial Forecasting

Introduction

In Chapters Backpropagation and Backpropagation II, the backpropagation simulator was developed. In this chapter, you will use the simulator to tackle a complex problem in financial forecasting. The application of neural networks to financial forecasting and modeling has been very popular over the last few years. Financial journals and magazines frequently mention the use of neural networks, and commercial tools and simulators are quite widespread.

This chapter gives you an overview of typical steps used in creating financial forecasting models. Many of the steps will be simplified, and so the results will not, unfortunately, be good enough for real life application. However, this chapter will hopefully serve as an introduction to the field with some added pointers for further reading and resources for those who want more detailed information.

Who Trades with Neural Networks?

There has been a great amount of interest on Wall Street for neural networks. Bradford Lewis runs two Fidelity funds in part with the use of neural networks. Also, LBS Capital Management (Peoria, Illinois) manages part of its portfolio with neural networks. According to Barron’s (February 27, 1995), LBS’s $150 million fund beat the averages by three percentage points a year since 1992. Each weekend, neural networks are retrained with the latest technical and fundamental data including P/E ratios, earnings results and interest rates. Another of LBS’s models has done worse than the S&P 500 for the past five years however. In the book Virtual Trading, Jeffrey Katz states that most of the successful neural network systems are proprietary and not publicly heard of. Clients who use neural networks usually don’t want anyone else to know what they are doing, for fear of losing their competitive edge. Firms put in many person-years of engineering design with a lot of CPU cycles to achieve practical and profitable results. Let’s look at the process:

Developing a Forecasting Model

There are many steps in building a forecasting model, as listed below.

1.  Decide on what your target is and develop a neural network (following these steps) for each target.

2.  Determine the time frame that you wish to forecast.

3.  Gather information about the problem domain.

4.  Gather the needed data and get a feel for each inputs relationship to the target.

5.  Process the data to highlight features for the network to discern.

6.  Transform the data as appropriate.

7.  Scale and bias the data for the network, as needed.

8.  Reduce the dimensionality of the input data as much as possible.

9.  Design a network architecture (topology, # layers, size of layers, parameters, learning paradigm).

10.  Go through the train/test/redesign loop for a network.

11.  Eliminate correlated inputs as much as possible, while in step 10.

12.  Deploy your network on new data and test it and refine it as necessary.

Once you develop a forecasting model, you then must integrate this into your trading system. A neural network can be designed to predict direction, or magnitude, or maybe just turning points in a particular market or something else. Avner Mandelman of Cereus Investments (Los Altos Hills, California) uses a long-range trained neural network to tell him when the market is making a top or bottom (Barron’s, December 14, 1992).

Now let’s expand on the twelve aspects of model building:

The Target and the Timeframe

What should the output of your neural network forecast? Let’s say you want to predict the stock market. Do you want to predict the S&P 500? Or, do you want to predict the direction of the S&P 500 perhaps? You could predict the volatility of the S&P 500 too (maybe if you’re an options player). Further, like Mr. Mandelman, you could only want to predict tops and bottoms, say, for the Dow Jones Industrial Average. You need to decide on the market or markets and also on your specific objectives.

Another crucial decision is the timeframe you want to predict forward. It is easier to create neural network models for longer term predictions than it is for shorter term predictions. You can see a lot of market noise, or seemingly random, chaotic variations at smaller and smaller timescale resolutions that might explain this. Another reason is that the macroeconomic forces that fundamentally move market over long periods, move slowly. The U.S. dollar makes multiyear trends, shaped by economic policy of governments around the world. For a given error tolerance, a one-year forecast, or one-month forecast will take less effort with a neural network than a one-day forecast will.

Domain Expertise

So far we’ve talked about the target and the timeframe. Now one other important aspect of model building is knowledge of the domain. If you want to create an effective predictive model of the weather, then you need to know or be able to guess about the factors that influence weather. The same holds true for the stock market or other financial market. In order to create a real tradable Treasury bond trading system, you need to have a good idea of what really drives the market and works— i.e., talk to a Tbond trader and encapsulate his domain expertise!

Gather the Data

Once you know the factors that influence the target output, you can gather raw data. If you are predicting the S&P 500, then you may consider Treasury yields, 3-month T-bill yields, and earnings as some of the factors. Once you have the data, you can do scatter plots to see if there is some correlation between the input and the target output. If you are not satisfied with the plot, you may consider a different input in its place.

Pre processing the Data for the Network

Surprising as it may sound, you are most likely going to spend about 90% of your time, as a neural network developer, in massaging and transforming data into meaningful form for training your network. We actually defined three substeps in this area of preprocessing in our master list:

  Highlight features

  Transform

  Scale and bias

Highlighting Features in the Input Data

You should present the neural network, as much as possible, with an easy way to find patterns in your data. For time series data, like stock market prices over time, you may consider presenting quantities like rate of change and acceleration (the first and second derivatives of your input) as examples. Other ways to highlight data is to magnify certain occurrences. For example, if you consider Central bank intervention as an important qualifier to foreign exchange rates, then you may include as an input to your network, a value of 1 or 0, to indicate the presence or lack of presence of Central bank intervention. Now if you further consider the activity of the U.S. Federal Reserve bank to be important by itself, then you may wish to highlight that, by separating it out as another 1/0 input. Using 1/0 coding to separate composite effects is called thermometer encoding.

There is a whole body of study of market behavior called Technical Analysis from which you may also wish to present technical studies on your data. There is a wide assortment of mathematical technical studies that you perform on your data (see references), such as moving averages to smooth data as an example. There are also pattern recognition studies you can use, like the “double-top” formation, which purportedly results in a high probability of significant decline. To be able to recognize such a pattern, you may wish to present a mathematical function that aids in the identification of the double-top.

You may want to de-emphasize unwanted noise in your input data. If you see a spike in your data, you can lessen its effect, by passing it through a moving average filter for example. You should be careful about introducing excessive lag in the resulting data though.

Transform the Data If Appropriate

For time series data, you may consider using a Fourier transform to move to the frequency-phase plane. This will uncover periodic cyclic information if it exists. The Fourier transform will decompose the input discrete data series into a series of frequency spikes that measure the relevance of each frequency component. If the stock market indeed follows the so-called January effect, where prices typically make a run up, then you would expect a strong yearly component in the frequency spectrum. Mark Jurik suggests sampling data with intervals that catch different cycle periods, in his paper on neural network data preparation (see references ).

You can use other signal processing techniques such as filtering. Besides the frequency domain, you can also consider moving to other spaces, such as with using the wavelet transform. You may also analyze the chaotic component of the data with chaos measures. It’s beyond the scope of this book to discuss these techniques. (Refer to the Resources section of this chapter for more information.) If you are developing short-term trading neural network systems, these techniques may play a significant role in your preprocessing effort. All of these techniques will provide new ways of looking at your data, for possible features to detect in other domains.

Scale Your Data

Neurons like to see data in a particular input range to be most effective. If you present data like the S&P 500 that varies from 200 to 550 (as the S&P 500 has over the years) will not be useful, since the middle layer of neurons have a Sigmoid Activation function that squashes large inputs to either 0 or +1. In other words, you should choose data that fit a range that does not saturate, or overwhelm the network neurons. Choosing inputs from –1 to 1 or 0 to 1 is a good idea. By the same token, you should normalize the expected values for the outputs to the 0 to 1 sigmoidal range.

It is important to pay attention to the number of input values in the data set that are close to zero. Since the weight change law is proportional to the input value, then a close to zero input will mean that that weight will not participate in learning! To avoid such situations, you can add a constant bias to your data to move the data closer to 0.5, where the neurons respond very well.

Reduce Dimensionality

You should try to eliminate inputs wherever possible. This will reduce the dimensionality of the problem and make it easier for your neural network to generalize. Suppose that you have three inputs, x, y and z and one output, o. Now suppose that you find that all of your inputs are restricted only to one plane. You could redefine axes such that you have x’ and y’ for the new plane and map your inputs to the new coordinates. This changes the number of inputs to your problem to 2 instead of 3, without any loss of information. This is illustrated in figure:

Reducing dimensionality from three to two dimensions.

Generalization versus Memorization

If your overall goal is beyond pattern classification, you need to track your network’s ability to generalize. Not only should you look at the overall error with the training set that you define, but you should set aside some training examples as part of a test set (and do not train with them), with which you can see whether or not the network is able to correctly predict. If the network responds poorly to your test set, you know that you have overtrained, or you can say the network “memorized” the training patterns. If you look at the arbitrary curve-fitting analogy in figure:

General (G) versus over fitting (0) of data.

you see curves for a generalized fit, labeled G, and an overfit, labeled O. In the case of the overfit, any data point outside of the training data results in highly erroneous prediction. Your test data will certainly show you large error in the case of an overfitted model.

Another way to consider this issue is in terms of Degrees Of Freedom (DOF). For the polynomial:

y= a0 + a1x + a2x2 + anxn...

the DOF equals the number of coefficients a0, a1 ... an, which is N + 1. So for the equation of a line (y=a0 + a1x), the DOF would be 2. For a parabola, this would be 3 and so on. The objective to not overfit data can be restated as an objective to obtain the function with the least DOF that fits the data adequately. For neural network models, the larger the number of trainable weights (which is a function of the number of inputs and the architecture), the larger the DOF. Be careful with having too many (unimportant) inputs. You may find terrific results with your training data, but extremely poor results with your test data.

Eliminate Correlated Inputs Where Possible

You have seen that getting to the minimum number of inputs for a given problem is important in terms of minimizing DOF and simplifying your model. Another way to reduce dimensionality is to look for correlated inputs and to carefully eliminate redundancy. For example, you may find that the Swiss franc and German mark are highly correlated over a certain time period of interest. You may wish to eliminate one of these inputs to reduce dimensionality. You have to be careful in this process though. You may find that a seemingly redundant piece of information is actually very important. Mark Jurik, of Jurik Consulting, in his paper on data preprocessing, suggests that one of the best ways to determine if an input is really needed is to construct neural network models with and without the input and choose the model with the best error on training and test data. Although very iterative, you can try eliminating as many inputs as possible this way and be assured that you haven’t eliminated a variable that really made a difference.

Another approach is sensitivity analysis, where you vary one input a little, while holding all others constant and note the effect on the output. If the effect is small you eliminate that input. This approach is flawed because in the real world, all the inputs are not constant. Jurik’s approach is more time consuming but will lead to a better model.

The process of decorrelation, or eliminating correlated inputs, can also utilize a linear algebra technique called principal component analysis. The result of principal component analysis is a minimum set of variables that contain the maximum information. For further information on principal component analysis, you should consult a statistics reference or research two methods of principal component analysis: the Karhunen-Loev transform and the Hotelling transform.

Design a Network Architecture

Now it’s time to actually design the neural network. For the backpropagation feed-forward neural network we have designed, this means making the following choices:

1.  The number of hidden layers.

2.  The size of hidden layers.

3.  The learning constant, beta([beta]).

4.  The momentum parameter, alpha([alpha]).

5.  The form of the squashing function (does not have to be the sigmoid).

6.  The starting point, that is, initial weight matrix.

7.  The addition of noise.

Some of the parameters listed can be made to vary with the number of cycles executed, similar to the current implementation of noise. For example, you can start with a learning constant [beta] that is large and reduce this constant as learning progresses. This allows rapid initial learning in the beginning of the process and may speed up the overall simulation time.

The Train/Test/Redesign Loop

Much of the process of determining the best parameters for a given application is trial and error. You need to spend a great deal of time evaluating different options to find the best fit for your problem. You may literally create hundreds if not thousands of networks either manually or automatically to search for the best solution. Many commercial neural network programs use genetic algorithms to help to automatically arrive at an optimum network. A genetic algorithm makes up possible solutions to a problem from a set of starting genes. Analogous to biological evolution, the algorithm combines genetic solutions with a predefined set of operators to create new generations of solutions, who survive or perish depending on their ability to solve the problem. The key benefit of genetic algorithms (GA) is the ability to traverse an enormous search space for a possibly optimum solution. You would program a GA to search for the number of hidden layers and other network parameters, and gradually evolve a neural network solution. Some vendors use a GA only to assign a starting set of weights to the network, instead of randomizing the weights to start you off near a good solution.

Now let’s review the steps:

1.  Split your data. First, divide you data set into three pieces, a training set, a test set and a blind test set. Use about 80% of your data records for your training set, 10% for your test set and 10% for your blind test set.

2.  Train and test. Next, start with a network topology and train your network on your training set data. When you have reached a satisfactory minimum error, save your weights and apply your trained network to the test data and note the error. Now restart the process with the same network topology for a different set of initial weights and see if you can achieve a better error on training and test sets. Reasoning: you may have found a local minimum on your first attempt and randomizing the initial weights will start you off to a different, maybe better solution.

3.  Eliminate correlated inputs. You may optionally try at this point to see if you can eliminate correlated inputs, as mentioned before, by iteratively removing each input and noting the best error you can achieve on the training and test sets for each of these cases. Choose the case that leads to the best error and eliminate the input (if any) that achieved it. You can repeat this whole process again to try to eliminate another input variable.

4.  Iteratively train and test. Now you can try other network parameters and repeat the train and test process to achieve a better result.

5.  Deploy your network. You now can use the blind test data set to see how your optimized network performs. If the error is not satisfactory, then you need to re-enter the design phase or the train and test phase.

6.  Revisit your network design when conditions change. You need to retrain your network when you have reason to think that you have new information relevant to the problem you are modeling. If you have a neural network that tries to predict the weekly change in the S&P 500, then you likely will need to retrain your network at least once a month, if not once a week. If you find that the network no longer generalizes well with the new information, you need to re-enter the design phase.

If this sounds like a lot of work, it is! Now, let’s try our luck at forecasting by going through a subset of the steps outlined:

Forecasting the S&P 500

The S&P 500 index is a widely followed stock average, like the Dow Jones Industrial Average (DJIA). It has a broader representation of the stock market since this average is based on 500 stocks, whereas the DJIA is based on only 30. The problem to be approached in this chapter is to predict the S&P 500 index, given a variety of indicators and data for prior weeks.

Choosing the Right Outputs and Objective

Our objective is to forecast the S&P 500 ten weeks from now. Whereas the objective may be to predict the level of the S&P 500, it is important to simplify the job of the network by asking for a change in the level rather than for the absolute level of the index. What you want to do is give the network the ability to fit the problem at hand conveniently in the output space of the output layer. Practically speaking, you know that the output from the network cannot be outside of the 0 to 1 range, since we have used a sigmoid activation function. You could take the S&P 500 index and scale this absolute price level to this range, for example. However, you will likely end up with very small numbers that have a small range of variability. The difference from week to week, on the other hand, has a much smaller overall range, and when these differences are scaled to the 0 to 1 range, you have much more variability.

The output we choose is the change in the S&P 500 from the current week to 10 weeks from now as a percentage of the current week’s value.

Choosing the Right Inputs

The inputs to the network need to be weekly changes of indicators that have some relevance to the S&P 500 index. This is a complex forecasting problem, and we can only guess at some of the relationships. This is one of the inherent strengths of using neural nets for forecasting; if a relationship is weak, the network will learn to ignore it automatically. Be cognizant that you do want to minimize the DOF as mentioned before though. In this example, we choose a data set that represents the state of the financial markets and the economy. The inputs chosen are listed as:

  Previous price action in the S&P 500 index, including the close or final value of the index

  Breadth indicators for the stock market, including the number of advancing issues and declining issues for the stocks in the New York Stock Exchange (NYSE)

  Other technical indicators, including the number of new highs and new lows achieved in the week for the NYSE market. This gives some indication about the strength of an uptrend or downtrend.

  Interest rates, including short-term interest rates in the Three-Month Treasury Bill Yield, and long-term rates in the 30-Year Treasury Bond Yield.

Other possible inputs could have been government statistics like the Consumer Price Index, Housing starts, and the Unemployment Rate. These were not chosen because long- and short-term interest rates tend to encompass this data already.

You are encouraged to experiment with other inputs and ideas. All of the data mentioned can be obtained in the public domain, such as from financial publications (Barron’s, Investor’s Business Daily, Wall Street Journal) and from ftp sites on the Internet for the Department of Commerce and the Federal Reserve, as well as from commercial vendors (see the Resource Guide at the end of the chapter). There are new sources cropping up on the Internet all the time. A sampling of World Wide Web addresses for commercial and noncommercial sources include:

Choosing a Network Architecture

The input and output layers are fixed by the number of inputs and outputs we are using. In our case, the output is a single number, the expected change in the S&P 500 index 10 weeks from now. The input layer size will be dictated by the number of inputs we have after preprocessing. You will see more on this soon. The middle layers can be either 1 or 2. It is best to choose the smallest number of neurons possible for a given problem to allow for generalization. If there are too many neurons, you will tend to get memorization of patterns. We will use one hidden layer. The size of the first hidden layer is generally recommended as between one-half to three times the size of the input layer. If a second hidden layer is present, you may have between three and ten times the number of output neurons. The best way to determine optimum size is by trial and error.


NOTE:  You should try to make sure that there are enough training examples for your trainable weights. In other words, your architecture may be dictated by the number of input training examples, or facts, you have. In an ideal world, you would want to have about 10 or more facts for each weight. For a 10-10-1 architecture, there are (10X10 + 10X1 = 110 weights), so you should aim for about 1100 facts. The smaller the ratio of facts to weights, the more likely you will be undertraining your network, which will lead to very poor generalization capability.


Preprocessing Data

We now begin the preprocessing effort. As mentioned before, this will likely be where you, the neural network designer, will spend most of your time.

A View of the Raw Data

Let’s look at the raw data for the problem we want to solve. There are a couple of ways we can start preprocessing the data to reduce the number of inputs and enhance the variability of the data:

  Use Advances/Declines ratio instead of each value separately.

  Use New Highs/New Lows ratio instead of each value separately.

We are left with the following indicators:

1.  Three-Month Treasury Bill Yield

2.  30-Year Treasury Bond Yield

3.  NYSE Advancing/Declining issues

4.  NYSE New Highs/New Lows

5.  S&P 500 closing price

Raw data for the period from January 4, 1980 to October 28, 1983 is taken as the training period, for a total of 200 weeks of data. The following 50 weeks are kept on reserve for a test period to see if the predictions are valid outside of the training interval. The last date of this period is October 19, 1984. Let’s look at the raw data now. (You get data on the disk available with this book that covers the period from January, 1980 to December, 1992.)

  shows you the S&P 500 stock index:

The S&P 500 Index for the period of interest.

  shows long-term bonds and short-term 3-month T-bill interest rates:

 

Long-term and short-term interest rates.

  shows some breadth indicators on the NYSE, the number of advancing stocks/number of declining stocks, as well as the ratio of new highs to new lows on the NYSE

 

Breadth indicators on the NYSE: Advancing/Declining issues and New Highs/New Lows.

A sample of a few lines looks like the following data in Table 14.1. Note that the order of parameters is the same as listed above.

Raw Data

Date

3Mo TBills

30YrTBonds

NYSE-Adv/Dec

NYSE-NewH/NewL

SP-Close

1/4/80

12.11

9.64

4.209459

2.764706

106.52

1/11/80

11.94

9.73

1.649573

21.28571

109.92

1/18/80

11.9

9.8

0.881335

4.210526

111.07

1/25/80

12.19

9.93

0.793269

3.606061

113.61

2/1/80

12.04

10.2

1.16293

2.088235

115.12

2/8/80

12.09

10.48

1.338415

2.936508

117.95

2/15/80

12.31

10.96

0.338053

0.134615

115.41

2/22/80

13.16

11.25

0.32381

0.109091

115.04

2/29/80

13.7

12.14

1.676895

0.179245

113.66

3/7/80

15.14

12.1

0.282591

0

106.9

3/14/80

15.38

12.01

0.690286

0.011628

105.43

3/21/80

15.05

11.73

0.486267

0.027933

102.31

3/28/80

16.53

11.67

5.247191

0.011628

100.68

4/3/80

15.04

12.06

0.983562

0.117647

102.15

4/11/80

14.42

11.81

1.565854

0.310345

103.79

4/18/80

13.82

11.23

1.113287

0.146341

100.55

4/25/80

12.73

10.59

0.849807

0.473684

105.16

5/2/80

10.79

10.42

1.147465

1.857143

105.58

5/9/80

9.73

10.15

0.513052

0.473684

104.72

5/16/80

8.6

9.7

1.342444

6.75

107.35

5/23/80

8.95

9.87

3.110825

26

110.62

Highlight Features in the Data

For each of the five inputs, we want use a function to highlight rate of change types of features. We will use the following function (as originally proposed by Jurik) for this purpose.

ROC(n) = (input(t) - BA(t - n)) / (input(t)+ BA(t - n))

where: input(t) is the input’s current value and BA(t - n) is a five unit block average of adjacent values centered around the value n periods ago.

Now we need to decide how many of these features we need. Since we are making a prediction 10 weeks into the future, we will take data as far back as 10 weeks also. This will be ROC(10). We will also use one other rate of change, ROC(3). We have now added 5*2 = 10 inputs to our network, for a total of 15. All of the preprocessing can be done with a spreadsheet.

Here’s what we get (Table 14.2) after doing the block averages. Example : BA3MoBills for 1/18/80 = (3MoBills(1/4/80) + 3MoBills(1/11/80) + 3MoBills(1/18/80) + 3MoBills(1/25/80) + 3MoBills(2/1/80))/5.

Data after Doing Block Averages

Date

3MoBills

LngBonds

NYSE-

1/4/80

12.11

9.64

4.209459

1/11/80

11.94

9.73

1.649573

1/18/80

11.9

9.8

0.881335

1/25/80

12.19

9.93

0.793269

2/1/80

12.04

10.2

1.16293

2/8/80

12.09

10.48

1.338415

2/15/80

12.31

10.96

0.338053

2/22/80

13.16

11.25

0.32381

2/29/80

13.7

12.14

1.676895

3/7/80

15.14

12.1

0.282591

3/14/80

15.38

12.01

0.690286

3/21/80

15.05

11.73

0.486267

3/28/80

16.53

11.67

5.247191

4/3/80

15.04

12.06

0.983562

4/11/80

14.42

11.81

1.565854

4/18/80

13.82

11.23

1.113287

4/25/80

12.73

10.59

0.849807

5/2/80

10.79

10.42

1.147465

5/9/80

9.73

10.15

0.513052

5/16/80

8.6

9.7

1.342444

5/23/80

8.95

9.87

3.110825

NYSE-Adv/Dec

SP-Close NewH/NewL

BA3MoB

BALngBnd

2.764706

106.52

 

 

21.28571

109.92

 

 

4.210526

111.07

12.036

9.86

3.606061

113.61

12.032

10.028

2.088235

115.12

12.106

10.274

2.936508

117.95

12.358

10.564

0.134615

115.41

12.66

11.006

0.109091

115.04

13.28

11.386

0.179245

113.66

13.938

11.692

0

106.9

14.486

11.846

0.011628

105.43

15.16

11.93

0.027933

102.31

15.428

11.914

0.011628

100.68

15.284

11.856

0.117647

102.15

14.972

11.7

0.310345

103.79

14.508

11.472

0.146341

100.55

13.36

11.222

0.473684

105.16

12.298

10.84

1.857143

105.58

11.134

10.418

0.473684

104.72

10.16

10.146

6.75

107.35

7.614

8.028

26

110.62

5.456

5.944

BAA/D

BAH/L

BAClose

 

1.739313

6.791048

111.248

 

1.165104

6.825408

113.534

 

0.9028

2.595189

114.632

 

0.791295

1.774902

115.426

 

0.968021

1.089539

115.436

 

0.791953

0.671892

113.792

 

0.662327

0.086916

111.288

 

0.69197

0.065579

108.668

 

1.676646

0.046087

105.796

 

1.537979

0.033767

103.494

 

1.794632

0.095836

102.872

 

1.879232

0.122779

101.896

 

1.95194

0.211929

102.466

 

1.131995

0.581032

103.446

 

1.037893

0.652239

103.96

 

0.993211

1.94017

104.672

 

1.392719

7.110902

106.686

 

1.222757

7.016165

85.654

 

0.993264

6.644737

64.538

 

Now let’s look at the rest of this table, which is made up of the new 10 values of ROC indicators (Table 14.3).

Added Rate of Change (ROC) Indicators

Date

ROC3_3Mo

ROC3_Bond

ROC10_AD

ROC3_H/L

ROC3_SPC

1/4/80

 

 

 

 

 

1/11/80

 

 

 

 

 

1/18/80

 

 

 

 

 

1/25/80

 

 

 

 

 

2/1/80

 

 

 

 

 

2/8/80

0.002238

0.030482

-0.13026

-0.39625

0.029241

2/15/80

0.011421

0.044406

-0.55021

-0.96132

0.008194

2/22/80

0.041716

0.045345

-0.47202

-0.91932

0.001776

2/29/80

0.0515

0.069415

0.358805

-0.81655

-0.00771

3/7/80

0.089209

0.047347

-0.54808

-1

-0.03839

3/14/80

0.073273

0.026671

-0.06859

-0.96598

-0.03814

3/21/80

0.038361

0.001622

-0.15328

-0.51357

-0.04203

3/28/80

0.065901

-0.00748

0.766981

-0.69879

-0.03816

4/3/80

-0.00397

0.005419

-0.26054

0.437052

-0.01753

4/11/80

-0.03377

-0.00438

0.008981

0.437052

-0.01753

4/18/80

-0.0503

-0.02712

-0.23431

0.803743

0.001428

4/25/80

-0.08093

-0.0498

-0.37721

0.58831

0.015764

5/2/80

-0.14697

-0.04805

-0.25956

0.795146

0.014968

5/9/80

-0.15721

-0.05016

-0.37625

-0.10178

0.00612

5/16/80

-0.17695

-0.0555

0.127944

0.823772

0.016043

5/23/80

-0.10874

-0.02701

0.515983

0.86112

0.027628

ROC10_3Mo

ROC10_Bnd

ROC10_AD

ROC10_HL

ROC10_SP

 

0.15732

0.084069

0.502093

-0.99658

-0.04987

 

0.111111

0.091996

-0.08449

-0.96611

-0.05278

 

0.087235

0.069553

0.268589

-0.78638

-0.04964

 

0.055848

0.030559

0.169062

-0.84766

-0.06888

 

0.002757

-0.01926

-0.06503

-0.39396

-0.04658

 

-0.10345

-0.0443

0.183309

0.468658

-0.03743

 

-0.17779

-0.0706

-0.127

0.689919

-0.03041

 

-0.25496

-0.0996

0.319735

0.980756

-0.0061

 

-0.25757

-0.0945

0.299569

0.996461

0.02229

 


NOTE:  Note that you don’t get completed rows until 3/28/90, since we have a ROC indicator dependent on a Block Average value 10 weeks before it. The first block average value is generated 1/1/80, two weeks after the start of the data set. All of this indicates that you will need to discard the first 12 values in the dataset to get complete rows, also called complete facts.


Normalizing the Range

We now have values in the original five data columns that have a very large range. We would like to reduce the range by some method. We use the following function:

new value = (old value - Mean)/ (Maximum Range)

This relates the distance from the mean for a value in a column as a fraction of the Maximum range for that column. You should note the value of the Maximum range and Mean, so that you can un-normalize the data when you get a result.

The Target

We’ve taken care of all our inputs, which number 15. The final piece of information is the target. The objective as stated at the beginning of this exercise is to predict the percentage change 10 weeks into the future. We need to time shift the S&P 500 close 10 weeks back, and then calculate the value as a percentage change as follows:

Result = 100 X ((S&P 10 weeks ahead) - (S&P this week))/(S&P this week).

This gives us a value that varies between -14.8 to and + 33.7. This is not in the form we need yet. As you recall, the output comes from a sigmoid function that is restricted to 0 to +1. We will first add 14.8 to all values and then scale them by a factor of 0.02. This will result in a scaled target that varies from 0 to 1.

scaled target = (result + 14.8) X 0.02

The final data file with the scaled target shown along with the scaled original six columns of data is shown in table:

Normalized Ranges for Original Columns and Scaled Target

Date

S_3MOBill

S_LngBnd

S_A/D

3/28/80

0.534853

-0.01616

0.765273

4/3/80

0.391308

0.055271

-0.06356

4/11/80

0.331578

0.009483

0.049635

4/18/80

0.273774

-0.09674

-0.03834

4/25/80

0.168765

-0.21396

-0.08956

5/2/80

-0.01813

-0.2451

-0.0317

5/9/80

-0.12025

-0.29455

-0.15503

5/16/80

-0.22912

-0.37696

0.006205

5/23/80

-0.1954

-0.34583

0.349971

S_H/L

S_SPC

Result

Scaled Target

-0.07089

-0.51328

12.43544

0.544709

-0.07046

-0.49236

12.88302

0.55366

-0.06969

-0.46901

9.89498

0.4939

-0.07035

-0.51513

15.36549

0.60331

-0.06903

-0.44951

11.71548

0.53031

-0.06345

-0.44353

11.61205

0.528241

-0.06903

-0.45577

16.53934

0.626787

-0.04372

-0.41833

12.51048

0.54621

0.033901

-0.37179

9.573314

0.487466

Storing Data in Different Files

You need to place the first 200 lines in a training.dat file (provided for you in the accompanying diskette) and the subsequent 40 lines of data in the another test.dat file for use in testing. You will read more about this shortly. There is also more data than this provided on this diskette in raw form for you to do further experiments.

Training and Testing

With the training data available, we set up a simulation. The number of inputs are 15, and the number of outputs is 1. A total of three layers are used with a middle layer of size 5. This number should be made as small as possible with acceptable results. The optimum sizes and number of layers can only be found by much trial and error. After each run, you can look at the error from the training set and from the test set.

Using the Simulator to Calculate Error

You obtain the error for the test set by running the simulator in Training mode (you need to temporarily copy the test data with expected outputs to the training.dat file) for one cycle with weights loaded from the weights file. Since this is the last and only cycle, weights do not get modified, and you can get a reading of the average error. Refer to Chapter 13 for more information on the simulator’s Test and Training modes. This approach has been taken with five runs of the simulator for 500 cycles each. Table 14.5 summarizes the results along with the parameters used. The error gets better and better with each run up to run # 4. For run #5, the training set error decreases, but the test set error increases, indicating the onset of memorization. Run # 4 is used for the final network results, showing test set RMS error of 13.9% and training set error of 6.9%.

Results of Training the Backpropagation Simulator for Predicting the Percentage Change in the S&P 500 Index

Run#

Tolerance

Beta

Alpha

NF

max cycles

cycles run

training set error

test set error

1

0.001

0.5

0.001

0.0005

500

500

0.150938

0.25429

2

0.001

0.4

0.001

0.0005

500

500

0.114948

0.185828

3

0.001

0.3

0

0

500

500

0.0936422

0.148541

4

0.001

0.2

0

0

500

500

0.068976

0.139230

5

0.001

0.1

0

0

500

500

0.0621412

0.143430

 


NOTE:  If you find the test set error does not decrease much, whereas the training set error continues to make substantial progress, then this means that memorization is starting to set in (run#5 in example). It is important to monitor the test set(s) that you are using while you are training to make sure that good, generalized learning is occurring versus memorizing or overfitting the data. In the case shown, the test set error continued to improve until run#5, where the test set error degraded. You need to revisit the 12-step process to forecasting model design to make any further improvements beyond what was achieved.


To see the exact correlation, you can copy any period you’d like, with the expected value output fields deleted, to the test.dat file. Then you run the simulator in Test mode and get the output value from the simulator for each input vector. You can then compare this with the expected value in your training set or test set.

Now that you’re done, you need to un-normalize the data back to get the answer in terms of the change in the S&P 500 index. What you’ve accomplished is a way in which you can get data from a financial newspaper, like Barron’s or Investor’s Business Daily, and feed the current week’s data into your trained neural network to get a prediction of what the S&P 500 index is likely to do ten weeks from now.

Here are the steps to un-normalize:

1.  Take the predicted scaled target value and calculate, the result value as Result = (Scaled target/0.02) - 14.8

2.  Take the result from step 1 (which is the percentage change 10 weeks from now) and calculate the projected value, Projected S&P 10 weeks from now = (This week’s S&P value)(1+ Result/100)

Only the Beginning

This is only a very brief illustration (not meant for trading !) of what you can do with neural networks in financial forecasting. You need to further analyze the data, provide more predictive indicators, and optimize/redesign your neural network architecture to get better generalization and lower error. You need to present many, many more test cases representing different market conditions to have a robust predictor that can be traded with. A graph of the expected and predicted output for the test set and the training set is shown in figure:

Comparison of predicted versus actual for the training and test data sets.

Here, the normalized values are used for the output. Note that the error is about 13.9% on average over the test set and 6.9% over the training set. You can see that the test set did well in the beginning, but showed great divergence in the last few weeks.

Technical Analysis and Neural Network Preprocessing

We cannot overstate the importance of preprocessing in developing a forecasting model. There is a large body of information related to the study of financial market behavior called Technical Analysis. You can use the mathematical studies defined by Technical Analysis to preprocess your input data to reveal predictive features. We will present a sampling of Technical Analysis studies that can be used, with formulae and graphs.

Moving Averages

Moving averages are used very widely to capture the underlying trend of a price move. Moving averages are simple filters that average data over a moving window. Popular moving averages include 5-, 10-, and 20 period moving averages. The formula is shown below for a simple moving average, SMA:

     SMAt = ( Pt + Pt-1 + ...  Pt-n)/ n
     where  n = the number of time periods back
        P-n= price at n time periods back

An exponential moving average is a weighted moving average that places more weight on the most recent data. The formula for this indicator, EMA is as follows:

     EMAt = (1 - a)Pt + a ( EMAt-1)
     where  a = smoothing constant  (typical 0.10)
        Pt= price at time t

Momentum and Rate of Change

Momentum is really velocity, or rate of price change with time. The formula for this is

     Mt =  ( Pt  -  Pt-a  )
     where  a = lookback parameter
     for a 5-day momentum value, a = 5

The Rate of Change indicator is actually a ratio. It is the current price divided by the price some interval, a, ago divided by a constant. Specifically,

     ROC = Pt / Pt-a  x 1000

Relative Strength Index

The Relative Strength Index (RSI) is the strength of up closes versus down closes over a certain time interval. It is calculated over a time interval T as :

     RSI = 100 - [ 100 / (1 + RS )]
     where  RS = average of x days’ up closes/ average of x days’ down
  closes

A typical time interval, T, is 14 days. The assumption with the use of RSI is that higher values up closes relative to down closes indicates a strong market, and the opposite indicates weak markets.

Percentage R

This indicator measures where in a recent range of prices today’s price falls. The indicator assumes that prices regress to their mean. A low %R indicates that prices are hitting the ceiling of a range, while a high %R indicates that prices are at their low in a range. The formula is:

     %R = 100 x (HighX - P)/(HighX - LowX)
     where  HighX is the highest price over the price interval of interest
        LowX is the lowest price over the price interval of interest
        P is the current price

Herrick Payoff Index

This indicator makes use of other market data that is available besides price information. It uses the volume of the security, which, for a stock, is the number of shares traded for a stock during a particular interval. It also uses the open interest, which is the value of the total number of open trades at a particular time. For a commodity future, this is the number of open short and long positions. This study attempts to measure the flow of money in and out of a market. The formula for this is as follows (note that a tick is the smallest permissible move in a given market) :

     Let MP = mean price over a particular interval
     OI = the larger of yesterday’s or today’s open interest

then

     K =[ (MPt - MPt-1 ) x dollar value of 1 tick move x volume ]
       x [1 +/-  2/OI]
     HPIt = HPIt-1  + [ 0.1 x (K -      HPIt-1 )] / 100,000

MACD

The MACD (moving average convergence divergence) indicator is the difference between two moving averages, and it tells you when short-term overbought or oversold conditions exist in the market. The formula is as follows:

     Let OSC = EMA1 - EMA2,
     where  EMA1 is for one smoothing constant and time period, for example
     0.15 and 12   weeks
          EMA2 is for another smoothing constant and time period,
           for example
     0.075 and 26   weeks

then

    MACDt = MACDt-1  + K x ( OSCt - MACDt-1 )
    where K is a smoothing constant, for example, 0.2

The final formula effectively does another exponential smoothing on the difference of the two moving averages, for example, over a 9-week period.

Stochastics

This indicator has absolutely nothing to do with stochastic processes. The reason for the name is a mystery, but the indicator is composed of two parts: %K and %D, which is a moving average of %K. The crossover of these lines indicates overbought and oversold areas. The formulas follow:

     Raw %K = 100 x (P - LowX )/(HighX - LowX)
     %Kt = [( %Kt-1   x 2  ) + Raw      %Kt ] /3
     %Dt = [( %Dt-1   x 2  ) + %Kt ] /3

On-Balance Volume

The on-balance volume (OBV) indicator was created to try to uncover accumulation and distribution patterns of large player in the stock market. This is a cumulative sum of volume data, specified as follows:

If today’s close is greater than yesterday’s close OBVt = OBVt-1 + 1

If today’s close is less than yesterday’s close OBVt = OBVt-1 - 1

The absolute value of the index is not important; attention is given only to the direction and trend.

Accumulation-Distribution

This indicator does for price what OBV does for volume.

If today’s close is greater than yesterday’s close: ADt = ADt-1 + (Closet - Lowt)

If today’s close is less than yesterday’s close: ADt = ADt-1 + (Hight - Closet)

Now let’s examine how these indicators look.

shows a bar chart, which is a chart of price data versus time, along with the following indicators:

  Ten-unit moving average

  Ten-unit exponential moving average

  Momentum

  MACD

  Percent R

Five minute bar chart of the S&P 500 Sept 95 Futures contract with several technical indicators displayed.

The time period shown is 5 minute bars for the S&P 500 September 1995 Futures contract. The top of each bar indicates the highest value (“high”) for that time interval, the bottom indicates the lowest value(“low”), and the horizontal lines on the bar indicate the initial (“open”) and final (“close”) values for the time interval.

Shows another bar chart for Intel Corporation stock for the period from December 1994 to July 1995, with each bar representing a day of activity. The following indicators are displayed also.

  Rate of Change

  Relative Strength

  Stochastics

  Accumulation-Distribution

Daily bar chart of Intel Corporation with several technical indicators displayed.

You have seen a few of the hundreds of technical indicators that have been invented to date. New indicators are being created rapidly as the field of Technical Analysis gains popularity and following. There are also pattern recognition studies, such as formations that resemble flags or pennants as well as more exotic types of studies, like Elliot wave counts. You can refer to books on Technical Analysis (e.g., Murphy) for more information about these and other studies.

Neural preprocessing with Technical Analysis tools as well as with traditional engineering analysis tools such as Fourier series, Wavelets, and Fractals can be very useful in finding predictive patterns for forecasting.

What Others Have Reported

In this final section of the chapter, we outline some case studies documented in periodicals and books, to give you an idea of the successes or failures to date with neural networks in financial forecasting. Keep in mind that the very best (= most profitable) results are usually never reported (so as not to lose a competitive edge) ! Also, remember that the market inefficiencies exploited yesterday may no longer be the same to exploit today.

Can a Three-Year-Old Trade Commodities?

Well, Hillary Clinton can certainly trade commodities, but a three-year-old, too? In his paper, “Commodity Trading with a Three Year Old,” J. E. Collard describes a neural network with the supposed intelligence of a three-year-old. The application used a feedforward backpropagation network with a 37-30-1 architecture. The network was trained to buy (“go long”) or sell (“go short”) in the live cattle commodity futures market. The training set consisted of 789 facts for trading days in 1988, 1989, 1990, and 1991. Each input vector consisted of 18 fundamental indicators and six market technical variables (Open, High, Low, Close, Open Interest, Volume). The network could be trained for the correct output on all but 11 of the 789 facts.

The fully trained network was used on 178 subsequent trading days in 1991. The cumulative profit increased from $0 to $1547.50 over this period by trading one live cattle contract. The largest loss in a trade was $601.74 and the largest gain in a trade was $648.30.

Forecasting Treasury Bill and Treasury Note Yields

Milam Aiken designed a feedforward backpropagation network that predicted Treasury Bill Rates and compared the forecast he obtained with forecasts made by top U.S. economists. The results showed the neural network, given the same data, made better predictions (.18 versus .71 absolute error). Aiken used 250 economic data series to see correlation to T-Bills and used only the series that showed leading correlation: Dept. of Commerce Index of Leading Economic Indicators, the Center for International Business Cycle Research (CIBCR) Short Leading Composite Index, and the CIBCR Long Leading Composite Index. Prior data for these three indicators for the past four years (total 12 inputs) was used to predict the average annual T-Bill rate (one output) for the current year.

Guido Deboeck and Masud Cader designed profitable trading systems for two-year and 10-year treasury securities. They used feedforward neural networks with a learning algorithm called extended-delta-bar-delta (EDBD), which is a variant of backpropagation. Training samples composed of 100 facts were selected from 1120 trading days spanning from July 1 1989 to June 30, 1992. The test period consisted of more than 150 trading days from July 1, 1992 to December 30, 1992. Performance on the test set was monitored every N thousand training cycles, and the training procedure was stopped when performance degraded on the test set. (This is the same procedure we used when developing a model for the S&P 500.)

A criterion used to judge model performance was the ratio of the average profit divided by the maximum drawdown, which is the largest unrealized loss seen during the trading period. A portfolio of separate designed trading systems for two-year and 10-year securities gave the following performance: Over a period of 4.5 years, the portfolio had 133 total trades with 65% profitable trades and the maximum drawdown of 64 basis points, or thousands of units for bond yields. The total gain was 677 basis points over that period with a maximum gain in one trade of 52 basis points and maximum loss in one trade of 47 basis points.

The stability and robustness of this system was checked by using over 1000 moving time windows of 3-month, 6-month, and 12-month duration over the 4.5-year interval and noting the standard deviations in profits and maximum drawdown. The maximum drawdown varied from 30 to 48 basis points.

Neural Nets versus Box-Jenkins Time-Series Forecasting

Ramesh Sharda and Rajendra Patil used a standard 12-12-1 feedforward backpropagation network and compared the results with Box-Jenkins methodology for time-series forecasting. Box-Jenkins forecasting is traditional time-series forecasting technique. The authors used 75 different time series for evaluation. The results showed that neural networks achieved better MAPE (mean absolute percentage error) with a mean over all 75 time series MAPEs of 14.67 versus 15.94 for the Box-Jenkins approach.

Neural Nets versus Regression Analysis

Leorey Marquez et al. compared neural network modeling with standard regression analysis. The authors used a feedforward backpropagation network with a structure of 1-6-1. They used three functional forms found in regression analysis:

1.  Y = B0 + B1 X + e

2.  Y = B0 + B1 log(X) + e

3.  Y = B0 + B1/X + e

For each of these forms, 100 pairs of (x,y) data were generated for this “true” model.

Now the neural network was trained on these 100 pairs of data. An additional 100 data points were generated by the network to test the forecasting ability of the network. The results showed that the neural network achieved a MAPE within 0.6% of the true model, which is a very good result. The neural network model approximated the linear model best. An experiment was also done with intentional mis-specification of some data points. The neural network model did well in these cases also, but comparatively worse for the reciprocal model case.

Hierarchical Neural Network

Here five neural networks are arranged such that four network outputs feed that final network. The four networks are trained to produce the High, Low, short-term trend, and medium-term trend for a particular financial instrument. The final network takes these four outputs as input and produces a turning point indicator.

Hierarchical neural network system to predict turning points.

Each network was trained and tested with 1200 fact days spanning 1988 to 1992 (33% used for testing). Preprocessing was accomplished by using differences of the inputs and with some technical analysis studies:

  Moving averages

  Exponential moving averages

  Stochastic indicators

For the network that produces a predicted High value, the average error ranged between 7.04% and 7.65% for various financial markets over the test period, including Treasury Bonds, Eurodollar, Japanese Yen, and S&P 500 futures contracts.

The Walk-Forward Methodology of Market Prediction

A methodology that is sometimes used in neural network design is walk-forward training and testing. This means that you choose an interval of time (e.g., six months) over which you train a neural network and test the network over a subsequent interval. You then move the training window and testing window forward one month, for example, and repeat the exercise. You do this for the time period of interest to see your forecasting results. The advantage of this approach is that you maximize the network’s ability to model the recent past in making a prediction. The disadvantage is that the network forget characteristics of the market that happened prior to the training window.

Takashi Kimoto et al. used the walk forward methodology in designing a trading system for Fujitsu and Nikko Securities. They also, like Mendelsohn, use a hierarchical neural network composed of individual feedforward neural networks. Prediction of the TOPIX, which is the Japanese equivalent of the Dow Jones Industrial Average, was performed for 33 months from January 1987 to September 1980. Four networks were used in the first level of the hierarchy trained on price data and economic data. The results were fed to a final network that generated buy and sell signals. The performance of the trading system achieved a result that is 20% better than a buy and hold strategy for the TOPIX.

Dual Confirmation Trading System

Jeremy Konstenius, discusses a trading system for the S&P 400 index with a holographic neural network, which is unlike the feedforward backpropagation neural network. The holographic network uses complex numbers for data input and output from neurons, which are mathematically more complex than feedforward network neurons. The author uses two trained networks to forecast the next day’s direction based on data for the past 10 days. Each network uses input data that is detrended, by subtracting a moving average from the data. Network 1 uses detrended closing values. Network 2 uses detrended High values. If both networks agree, or confirm each other, then a trade is made. There is no trade otherwise.

Network 1 showed an accuracy of 61.9% for the five-month test period (the training period spanned two years prior to the test period), while Network 2 also showed an accuracy of 61.9%. Using the two networks together, Konstenius achieved an accuracy of 65.82%.

A Turning Point Predictor

This neural network approach is discussed by Michitaka Kosaka et al. (1991).

They discuss applying the feedforward backpropagation network to develop buy/sell signals for securities. You would gather time-series data on stock prices, and want to find trends in the data so that changes in the direction of the trend provide you the turning points, which you interpret as signals to buy or sell.

You will need to list these factors that you think have any influence on the price of a security you are studying. You need to also determine how you measure these factors. You then formulate a nonlinear function combining the factors on your list and the past however many prices of your security (your time series data).

The function has the form, as Michitaka Kosaka, et al. (1991) put it,

     p(t + h) = F(x(t), x(t -1), ... , f1, f2, ... )
     where
     f1, f2, represent factors on your list,
     x(t) is the price of your stock at time t,
     p(t + h) is the turning point of security price at time t + h, and
     p(t + h) = -1 for a turn from downward to upward,
     p(t + h) = +1 for a turn from upward to downward,
     p(t + h) = 0 for no change and therefore no turn

Here you vary h through the values 1, 2, etc. as you move into the future one day (time period) at a time. Note that the detailed form of the function F is not given. This is for you to set up as you see fit.

You can set up a similar function for x(t + h), the stock price at time t + h, and have a separate network computing it using the backpropagation paradigm. You will then be generating future prices of the stock and the future buy/sell signals hand in hand, but parallel.

Michitaka Kosaka, et al. (1991) report that they used time-series data over five years to identify the network model, and time-series data over one year to evaluate the model’s forecasting performance, with a success rate of 65% for turning points.

The S&P 500 and Sunspot Predictions

Michael Azoff in his book on time-series forecasting with neural networks (see references) creates neural network systems for predicting the S&P 500 index as well as for predicting chaotic time series, such as sunspot occurrences. Azoff uses feedforward backpropagation networks, with a training algorithm called adaptive steepest descent, a variation of the standard algorithm. For the sunspot time series, and an architecture of 6-5-1, and a ratio of training vectors to trainable weights of 5.1, he achieves training set error of 12.9% and test set error of 21.4%. This series was composed of yearly sunspot numbers for the years 1706 to 1914. Six years of consecutive annual data were input to the network.

One network Azoff used to forecast the S&P 500 index was a 17-7-1 network. The training vectors to trainable weights ratio was 6.1. The achieved training set error was 3.29%, and on the test set error was 4.67%. Inputs to this network included price data, a volatility indicator, which is a function of the range of price movement, and a random walk indicator, a technical analysis study.

A Critique of Neural Network Time-Series Forecasting for Trading

Michael de la Maza and Deniz Yuret, managers for the Redfire Capital Management Group, suggest that risk-adjusted return, and not mean-squared error should be the metric to optimize in a neural network application for trading. They also point out that with neural networks, like with statistical methods such as linear regression, data facts that seem unexplainable can’t be ignored even if you want them to be. There is no equivalent for a “don’t care,” condition for the output of a neural network. This type of condition may be an important option for trading environments that have no “discoverable regularity” as the authors put it, and therefore are really not tradable. Some solutions to the two problems posed are given as follows:

  Use an algorithm other than backpropagation, which allows for maximization of risk-adjusted return, such as simulated annealing or genetic algorithms.

  Transform the data input to the network so that minimizing mean-squared error becomes equivalent to maximizing risk-adjusted return.

  Use a hierarchy (see hierarchical neural network earlier in this section) of neural networks, with each network responsible for detecting features or regularities from one component of the data.

Resource Guide for Neural Networks and Fuzzy Logic in Finance

Here is a sampling of resources compiled from trade literature:


NOTE:  We do not take responsibility for any errors or omissions.


 

Summary

This chapter presented a neural network application in financial forecasting. As an example of the steps needed to develop a neural network forecasting model, the change in the Standard & Poor’s 500 stock index was predicted 10 weeks out based on weekly data for five indicators. Some examples of preprocessing of data for the network were shown as well as issues in training.

At the end of the training period, it was seen that memorization was taking place, since the error in the test data degraded, whereas the error in the training set improved. It is important to monitor the error in the test data (without weight changes) while you are training to ensure that generalization ability is maintained. The final network resulted in average RMS error of 6.9 % over the training set and 13.9% error over the test set.

This chapter’s example in forecasting highlights the ease of use and wide applicability of the backpropagation algorithm for large, complex problems and data sets. Several examples of research in financial forecasting were presented with a number of ideas and real-life methodologies presented.

Technical Analysis was briefly discussed with examples of studies that can be useful in preprocessing data for neural networks.

A Resource guide was presented for further information on financial applications of neural networks.