In Chapters Backpropagation and Backpropagation II, the backpropagation simulator was developed. In this chapter, you will use the simulator to tackle a complex problem in financial forecasting. The application of neural networks to financial forecasting and modeling has been very popular over the last few years. Financial journals and magazines frequently mention the use of neural networks, and commercial tools and simulators are quite widespread.
This chapter gives you an overview of typical steps used in creating financial forecasting models. Many of the steps will be simplified, and so the results will not, unfortunately, be good enough for real life application. However, this chapter will hopefully serve as an introduction to the field with some added pointers for further reading and resources for those who want more detailed information.
There has been a great amount of interest on Wall Street for neural networks. Bradford Lewis runs two Fidelity funds in part with the use of neural networks. Also, LBS Capital Management (Peoria, Illinois) manages part of its portfolio with neural networks. According to Barron’s (February 27, 1995), LBS’s $150 million fund beat the averages by three percentage points a year since 1992. Each weekend, neural networks are retrained with the latest technical and fundamental data including P/E ratios, earnings results and interest rates. Another of LBS’s models has done worse than the S&P 500 for the past five years however. In the book Virtual Trading, Jeffrey Katz states that most of the successful neural network systems are proprietary and not publicly heard of. Clients who use neural networks usually don’t want anyone else to know what they are doing, for fear of losing their competitive edge. Firms put in many personyears of engineering design with a lot of CPU cycles to achieve practical and profitable results. Let’s look at the process:
There are many steps in building a forecasting model, as listed below.
1. Decide on what your target is and develop a neural network (following these steps) for each target.
2. Determine the time frame that you wish to forecast.
3. Gather information about the problem domain.
4. Gather the needed data and get a feel for each inputs relationship to the target.
5. Process the data to highlight features for the network to discern.
6. Transform the data as appropriate.
7. Scale and bias the data for the network, as needed.
8. Reduce the dimensionality of the input data as much as possible.
9. Design a network architecture (topology, # layers, size of layers, parameters, learning paradigm).
10. Go through the train/test/redesign loop for a network.
11. Eliminate correlated inputs as much as possible, while in step 10.
12. Deploy your network on new data and test it and refine it as necessary.
Once you develop a forecasting model, you then must integrate this into your trading system. A neural network can be designed to predict direction, or magnitude, or maybe just turning points in a particular market or something else. Avner Mandelman of Cereus Investments (Los Altos Hills, California) uses a longrange trained neural network to tell him when the market is making a top or bottom (Barron’s, December 14, 1992).
Now let’s expand on the twelve aspects of model building:
What should the output of your neural network forecast? Let’s say you want to predict the stock market. Do you want to predict the S&P 500? Or, do you want to predict the direction of the S&P 500 perhaps? You could predict the volatility of the S&P 500 too (maybe if you’re an options player). Further, like Mr. Mandelman, you could only want to predict tops and bottoms, say, for the Dow Jones Industrial Average. You need to decide on the market or markets and also on your specific objectives.
Another crucial decision is the timeframe you want to predict forward. It is easier to create neural network models for longer term predictions than it is for shorter term predictions. You can see a lot of market noise, or seemingly random, chaotic variations at smaller and smaller timescale resolutions that might explain this. Another reason is that the macroeconomic forces that fundamentally move market over long periods, move slowly. The U.S. dollar makes multiyear trends, shaped by economic policy of governments around the world. For a given error tolerance, a oneyear forecast, or onemonth forecast will take less effort with a neural network than a oneday forecast will.
So far we’ve talked about the target and the timeframe. Now one other important aspect of model building is knowledge of the domain. If you want to create an effective predictive model of the weather, then you need to know or be able to guess about the factors that influence weather. The same holds true for the stock market or other financial market. In order to create a real tradable Treasury bond trading system, you need to have a good idea of what really drives the market and works— i.e., talk to a Tbond trader and encapsulate his domain expertise!
Once you know the factors that influence the target output, you can gather raw data. If you are predicting the S&P 500, then you may consider Treasury yields, 3month Tbill yields, and earnings as some of the factors. Once you have the data, you can do scatter plots to see if there is some correlation between the input and the target output. If you are not satisfied with the plot, you may consider a different input in its place.
Surprising as it may sound, you are most likely going to spend about 90% of your time, as a neural network developer, in massaging and transforming data into meaningful form for training your network. We actually defined three substeps in this area of preprocessing in our master list:
• Highlight features
• Transform
• Scale and bias
Highlighting Features in the Input Data
You should present the neural network, as much as possible, with an easy way to find patterns in your data. For time series data, like stock market prices over time, you may consider presenting quantities like rate of change and acceleration (the first and second derivatives of your input) as examples. Other ways to highlight data is to magnify certain occurrences. For example, if you consider Central bank intervention as an important qualifier to foreign exchange rates, then you may include as an input to your network, a value of 1 or 0, to indicate the presence or lack of presence of Central bank intervention. Now if you further consider the activity of the U.S. Federal Reserve bank to be important by itself, then you may wish to highlight that, by separating it out as another 1/0 input. Using 1/0 coding to separate composite effects is called thermometer encoding.
There is a whole body of study of market behavior called Technical Analysis from which you may also wish to present technical studies on your data. There is a wide assortment of mathematical technical studies that you perform on your data (see references), such as moving averages to smooth data as an example. There are also pattern recognition studies you can use, like the “doubletop” formation, which purportedly results in a high probability of significant decline. To be able to recognize such a pattern, you may wish to present a mathematical function that aids in the identification of the doubletop.
You may want to deemphasize unwanted noise in your input data. If you see a spike in your data, you can lessen its effect, by passing it through a moving average filter for example. You should be careful about introducing excessive lag in the resulting data though.
Transform the Data If Appropriate
For time series data, you may consider using a Fourier transform to move to the frequencyphase plane. This will uncover periodic cyclic information if it exists. The Fourier transform will decompose the input discrete data series into a series of frequency spikes that measure the relevance of each frequency component. If the stock market indeed follows the socalled January effect, where prices typically make a run up, then you would expect a strong yearly component in the frequency spectrum. Mark Jurik suggests sampling data with intervals that catch different cycle periods, in his paper on neural network data preparation (see references ).
You can use other signal processing techniques such as filtering. Besides the frequency domain, you can also consider moving to other spaces, such as with using the wavelet transform. You may also analyze the chaotic component of the data with chaos measures. It’s beyond the scope of this book to discuss these techniques. (Refer to the Resources section of this chapter for more information.) If you are developing shortterm trading neural network systems, these techniques may play a significant role in your preprocessing effort. All of these techniques will provide new ways of looking at your data, for possible features to detect in other domains.
Scale Your Data
Neurons like to see data in a particular input range to be most effective. If you present data like the S&P 500 that varies from 200 to 550 (as the S&P 500 has over the years) will not be useful, since the middle layer of neurons have a Sigmoid Activation function that squashes large inputs to either 0 or +1. In other words, you should choose data that fit a range that does not saturate, or overwhelm the network neurons. Choosing inputs from –1 to 1 or 0 to 1 is a good idea. By the same token, you should normalize the expected values for the outputs to the 0 to 1 sigmoidal range.
It is important to pay attention to the number of input values in the data set that are close to zero. Since the weight change law is proportional to the input value, then a close to zero input will mean that that weight will not participate in learning! To avoid such situations, you can add a constant bias to your data to move the data closer to 0.5, where the neurons respond very well.
You should try to eliminate inputs wherever possible. This will reduce the dimensionality of the problem and make it easier for your neural network to generalize. Suppose that you have three inputs, x, y and z and one output, o. Now suppose that you find that all of your inputs are restricted only to one plane. You could redefine axes such that you have x’ and y’ for the new plane and map your inputs to the new coordinates. This changes the number of inputs to your problem to 2 instead of 3, without any loss of information. This is illustrated in figure:
Reducing dimensionality from three to two dimensions.
Generalization versus Memorization
If your overall goal is beyond pattern classification, you need to track your network’s ability to generalize. Not only should you look at the overall error with the training set that you define, but you should set aside some training examples as part of a test set (and do not train with them), with which you can see whether or not the network is able to correctly predict. If the network responds poorly to your test set, you know that you have overtrained, or you can say the network “memorized” the training patterns. If you look at the arbitrary curvefitting analogy in figure:
General (G) versus over fitting (0)
of data.
you see curves for a generalized fit, labeled G, and an overfit, labeled O. In the case of the overfit, any data point outside of the training data results in highly erroneous prediction. Your test data will certainly show you large error in the case of an overfitted model.
Another way to consider this issue is in terms of Degrees Of Freedom (DOF). For the polynomial:
y= a0 + a1x + a2x2 + anxn...
the DOF equals the number of coefficients a0, a1 ... an, which is N + 1. So for the equation of a line (y=a0 + a1x), the DOF would be 2. For a parabola, this would be 3 and so on. The objective to not overfit data can be restated as an objective to obtain the function with the least DOF that fits the data adequately. For neural network models, the larger the number of trainable weights (which is a function of the number of inputs and the architecture), the larger the DOF. Be careful with having too many (unimportant) inputs. You may find terrific results with your training data, but extremely poor results with your test data.
You have seen that getting to the minimum number of inputs for a given problem is important in terms of minimizing DOF and simplifying your model. Another way to reduce dimensionality is to look for correlated inputs and to carefully eliminate redundancy. For example, you may find that the Swiss franc and German mark are highly correlated over a certain time period of interest. You may wish to eliminate one of these inputs to reduce dimensionality. You have to be careful in this process though. You may find that a seemingly redundant piece of information is actually very important. Mark Jurik, of Jurik Consulting, in his paper on data preprocessing, suggests that one of the best ways to determine if an input is really needed is to construct neural network models with and without the input and choose the model with the best error on training and test data. Although very iterative, you can try eliminating as many inputs as possible this way and be assured that you haven’t eliminated a variable that really made a difference.
Another approach is sensitivity analysis, where you vary one input a little, while holding all others constant and note the effect on the output. If the effect is small you eliminate that input. This approach is flawed because in the real world, all the inputs are not constant. Jurik’s approach is more time consuming but will lead to a better model.
The process of decorrelation, or eliminating correlated inputs, can also utilize a linear algebra technique called principal component analysis. The result of principal component analysis is a minimum set of variables that contain the maximum information. For further information on principal component analysis, you should consult a statistics reference or research two methods of principal component analysis: the KarhunenLoev transform and the Hotelling transform.
Now it’s time to actually design the neural network. For the backpropagation feedforward neural network we have designed, this means making the following choices:
1. The number of hidden layers.
2. The size of hidden layers.
3. The learning constant, beta([beta]).
4. The momentum parameter, alpha([alpha]).
5. The form of the squashing function (does not have to be the sigmoid).
6. The starting point, that is, initial weight matrix.
7. The addition of noise.
Some of the parameters listed can be made to vary with the number of cycles executed, similar to the current implementation of noise. For example, you can start with a learning constant [beta] that is large and reduce this constant as learning progresses. This allows rapid initial learning in the beginning of the process and may speed up the overall simulation time.
Much of the process of determining the best parameters for a given application is trial and error. You need to spend a great deal of time evaluating different options to find the best fit for your problem. You may literally create hundreds if not thousands of networks either manually or automatically to search for the best solution. Many commercial neural network programs use genetic algorithms to help to automatically arrive at an optimum network. A genetic algorithm makes up possible solutions to a problem from a set of starting genes. Analogous to biological evolution, the algorithm combines genetic solutions with a predefined set of operators to create new generations of solutions, who survive or perish depending on their ability to solve the problem. The key benefit of genetic algorithms (GA) is the ability to traverse an enormous search space for a possibly optimum solution. You would program a GA to search for the number of hidden layers and other network parameters, and gradually evolve a neural network solution. Some vendors use a GA only to assign a starting set of weights to the network, instead of randomizing the weights to start you off near a good solution.
Now let’s review the steps:
1. Split your data. First, divide you data set into three pieces, a training set, a test set and a blind test set. Use about 80% of your data records for your training set, 10% for your test set and 10% for your blind test set.
2. Train and test. Next, start with a network topology and train your network on your training set data. When you have reached a satisfactory minimum error, save your weights and apply your trained network to the test data and note the error. Now restart the process with the same network topology for a different set of initial weights and see if you can achieve a better error on training and test sets. Reasoning: you may have found a local minimum on your first attempt and randomizing the initial weights will start you off to a different, maybe better solution.
3. Eliminate correlated inputs. You may optionally try at this point to see if you can eliminate correlated inputs, as mentioned before, by iteratively removing each input and noting the best error you can achieve on the training and test sets for each of these cases. Choose the case that leads to the best error and eliminate the input (if any) that achieved it. You can repeat this whole process again to try to eliminate another input variable.
4. Iteratively train and test. Now you can try other network parameters and repeat the train and test process to achieve a better result.
5. Deploy your network. You now can use the blind test data set to see how your optimized network performs. If the error is not satisfactory, then you need to reenter the design phase or the train and test phase.
6. Revisit your network design when conditions change. You need to retrain your network when you have reason to think that you have new information relevant to the problem you are modeling. If you have a neural network that tries to predict the weekly change in the S&P 500, then you likely will need to retrain your network at least once a month, if not once a week. If you find that the network no longer generalizes well with the new information, you need to reenter the design phase.
If this sounds like a lot of work, it is! Now, let’s try our luck at forecasting by going through a subset of the steps outlined:
The S&P 500 index is a widely followed stock average, like the Dow Jones Industrial Average (DJIA). It has a broader representation of the stock market since this average is based on 500 stocks, whereas the DJIA is based on only 30. The problem to be approached in this chapter is to predict the S&P 500 index, given a variety of indicators and data for prior weeks.
Our objective is to forecast the S&P 500 ten weeks from now. Whereas the objective may be to predict the level of the S&P 500, it is important to simplify the job of the network by asking for a change in the level rather than for the absolute level of the index. What you want to do is give the network the ability to fit the problem at hand conveniently in the output space of the output layer. Practically speaking, you know that the output from the network cannot be outside of the 0 to 1 range, since we have used a sigmoid activation function. You could take the S&P 500 index and scale this absolute price level to this range, for example. However, you will likely end up with very small numbers that have a small range of variability. The difference from week to week, on the other hand, has a much smaller overall range, and when these differences are scaled to the 0 to 1 range, you have much more variability.
The output we choose is the change in the S&P 500 from the current week to 10 weeks from now as a percentage of the current week’s value.
The inputs to the network need to be weekly changes of indicators that have some relevance to the S&P 500 index. This is a complex forecasting problem, and we can only guess at some of the relationships. This is one of the inherent strengths of using neural nets for forecasting; if a relationship is weak, the network will learn to ignore it automatically. Be cognizant that you do want to minimize the DOF as mentioned before though. In this example, we choose a data set that represents the state of the financial markets and the economy. The inputs chosen are listed as:
• Previous price action in the S&P 500 index, including the close or final value of the index
• Breadth indicators for the stock market, including the number of advancing issues and declining issues for the stocks in the New York Stock Exchange (NYSE)
• Other technical indicators, including the number of new highs and new lows achieved in the week for the NYSE market. This gives some indication about the strength of an uptrend or downtrend.
• Interest rates, including shortterm interest rates in the ThreeMonth Treasury Bill Yield, and longterm rates in the 30Year Treasury Bond Yield.
Other possible inputs could have been government statistics like the Consumer Price Index, Housing starts, and the Unemployment Rate. These were not chosen because long and shortterm interest rates tend to encompass this data already.
You are encouraged to experiment with other inputs and ideas. All of the
data mentioned can be obtained in the public domain, such as from financial
publications (Barron’s, Investor’s
Business Daily, Wall Street Journal) and from ftp sites on the
Internet for the Department of Commerce and the Federal Reserve, as well as
from commercial vendors (see the Resource Guide at the end of the chapter).
There are new sources cropping up on the Internet all the time. A sampling of
World Wide Web addresses for commercial and noncommercial sources include:
The input and output layers are fixed by the number of inputs and outputs we are using. In our case, the output is a single number, the expected change in the S&P 500 index 10 weeks from now. The input layer size will be dictated by the number of inputs we have after preprocessing. You will see more on this soon. The middle layers can be either 1 or 2. It is best to choose the smallest number of neurons possible for a given problem to allow for generalization. If there are too many neurons, you will tend to get memorization of patterns. We will use one hidden layer. The size of the first hidden layer is generally recommended as between onehalf to three times the size of the input layer. If a second hidden layer is present, you may have between three and ten times the number of output neurons. The best way to determine optimum size is by trial and error.
NOTE: You should try to make sure that there are enough
training examples for your trainable weights. In other words, your architecture
may be dictated by the number of input training examples, or facts, you
have. In an ideal world, you would want to have about 10 or more facts for each
weight. For a 10101 architecture, there are (10X10 + 10X1 = 110 weights), so
you should aim for about 1100 facts. The smaller the ratio of facts to weights,
the more likely you will be undertraining your
network, which will lead to very poor generalization capability.
We now begin the preprocessing effort. As mentioned before, this will likely be where you, the neural network designer, will spend most of your time.
Let’s look at the raw data for the problem we want to solve. There are a couple of ways we can start preprocessing the data to reduce the number of inputs and enhance the variability of the data:
• Use Advances/Declines ratio instead of each value separately.
• Use New Highs/New Lows ratio instead of each value separately.
We are left with the following indicators:
1. ThreeMonth Treasury Bill Yield
2. 30Year Treasury Bond Yield
3. NYSE Advancing/Declining issues
4. NYSE New Highs/New Lows
5. S&P 500 closing price
Raw data for the period from January 4, 1980 to October 28, 1983 is taken as the training period, for a total of 200 weeks of data. The following 50 weeks are kept on reserve for a test period to see if the predictions are valid outside of the training interval. The last date of this period is October 19, 1984. Let’s look at the raw data now. (You get data on the disk available with this book that covers the period from January, 1980 to December, 1992.)
• shows
you the S&P 500 stock index:
The S&P 500 Index for the period of interest.
• shows
longterm bonds and shortterm 3month Tbill interest rates:
Longterm and shortterm interest rates.
• shows
some breadth indicators on the NYSE, the number of advancing stocks/number of
declining stocks, as well as the ratio of new highs to new lows on the NYSE
Breadth indicators on the NYSE: Advancing/Declining issues and New Highs/New Lows.
A sample of a few lines looks like the following data in Table 14.1. Note
that the order of parameters is the same as listed above.
Raw Data 

Date 
3Mo TBills 
30YrTBonds 
NYSEAdv/Dec 
NYSENewH/NewL

SPClose 
1/4/80 
12.11 
9.64 
4.209459 
2.764706 
106.52 
1/11/80 
11.94 
9.73 
1.649573 
21.28571 
109.92 
1/18/80 
11.9 
9.8 
0.881335 
4.210526 
111.07 
1/25/80 
12.19 
9.93 
0.793269 
3.606061 
113.61 
2/1/80 
12.04 
10.2 
1.16293 
2.088235 
115.12 
2/8/80 
12.09 
10.48 
1.338415 
2.936508 
117.95 
2/15/80 
12.31 
10.96 
0.338053 
0.134615 
115.41 
2/22/80 
13.16 
11.25 
0.32381 
0.109091 
115.04 
2/29/80 
13.7 
12.14 
1.676895 
0.179245 
113.66 
3/7/80 
15.14 
12.1 
0.282591 
0 
106.9 
3/14/80 
15.38 
12.01 
0.690286 
0.011628 
105.43 
3/21/80 
15.05 
11.73 
0.486267 
0.027933 
102.31 
3/28/80 
16.53 
11.67 
5.247191 
0.011628 
100.68 
4/3/80 
15.04 
12.06 
0.983562 
0.117647 
102.15 
4/11/80 
14.42 
11.81 
1.565854 
0.310345 
103.79 
4/18/80 
13.82 
11.23 
1.113287 
0.146341 
100.55 
4/25/80 
12.73 
10.59 
0.849807 
0.473684 
105.16 
5/2/80 
10.79 
10.42 
1.147465 
1.857143 
105.58 
5/9/80 
9.73 
10.15 
0.513052 
0.473684 
104.72 
5/16/80 
8.6 
9.7 
1.342444 
6.75 
107.35 
5/23/80 
8.95 
9.87 
3.110825 
26 
110.62 
For each of the five inputs, we want use a function to highlight rate of change types of features. We will use the following function (as originally proposed by Jurik) for this purpose.
ROC(n) = (input(t)  BA(t  n)) / (input(t)+ BA(t  n))
where: input(t) is the input’s current value and BA(t  n) is a five unit block average of adjacent values centered around the value n periods ago.
Now we need to decide how many of these features we need. Since we are making a prediction 10 weeks into the future, we will take data as far back as 10 weeks also. This will be ROC(10). We will also use one other rate of change, ROC(3). We have now added 5*2 = 10 inputs to our network, for a total of 15. All of the preprocessing can be done with a spreadsheet.
Here’s what we get (Table 14.2) after doing the block averages.
Example : BA3MoBills for 1/18/80 = (3MoBills(1/4/80) + 3MoBills(1/11/80) +
3MoBills(1/18/80) + 3MoBills(1/25/80) + 3MoBills(2/1/80))/5.
Data after Doing
Block Averages 

Date 
3MoBills 
LngBonds 
NYSE 
1/4/80 
12.11 
9.64 
4.209459 
1/11/80 
11.94 
9.73 
1.649573 
1/18/80 
11.9 
9.8 
0.881335 
1/25/80 
12.19 
9.93 
0.793269 
2/1/80 
12.04 
10.2 
1.16293 
2/8/80 
12.09 
10.48 
1.338415 
2/15/80 
12.31 
10.96 
0.338053 
2/22/80 
13.16 
11.25 
0.32381 
2/29/80 
13.7 
12.14 
1.676895 
3/7/80 
15.14 
12.1 
0.282591 
3/14/80 
15.38 
12.01 
0.690286 
3/21/80 
15.05 
11.73 
0.486267 
3/28/80 
16.53 
11.67 
5.247191 
4/3/80 
15.04 
12.06 
0.983562 
4/11/80 
14.42 
11.81 
1.565854 
4/18/80 
13.82 
11.23 
1.113287 
4/25/80 
12.73 
10.59 
0.849807 
5/2/80 
10.79 
10.42 
1.147465 
5/9/80 
9.73 
10.15 
0.513052 
5/16/80 
8.6 
9.7 
1.342444 
5/23/80 
8.95 
9.87 
3.110825 
NYSEAdv/Dec 
SPClose NewH/NewL 
BA3MoB 
BALngBnd 
2.764706 
106.52 


21.28571 
109.92 


4.210526 
111.07 
12.036 
9.86 
3.606061 
113.61 
12.032 
10.028 
2.088235 
115.12 
12.106 
10.274 
2.936508 
117.95 
12.358 
10.564 
0.134615 
115.41 
12.66 
11.006 
0.109091 
115.04 
13.28 
11.386 
0.179245 
113.66 
13.938 
11.692 
0 
106.9 
14.486 
11.846 
0.011628 
105.43 
15.16 
11.93 
0.027933 
102.31 
15.428 
11.914 
0.011628 
100.68 
15.284 
11.856 
0.117647 
102.15 
14.972 
11.7 
0.310345 
103.79 
14.508 
11.472 
0.146341 
100.55 
13.36 
11.222 
0.473684 
105.16 
12.298 
10.84 
1.857143 
105.58 
11.134 
10.418 
0.473684 
104.72 
10.16 
10.146 
6.75 
107.35 
7.614 
8.028 
26 
110.62 
5.456 
5.944 
BAA/D 
BAH/L 
BAClose 

1.739313 
6.791048 
111.248 

1.165104 
6.825408 
113.534 

0.9028 
2.595189 
114.632 

0.791295 
1.774902 
115.426 

0.968021 
1.089539 
115.436 

0.791953 
0.671892 
113.792 

0.662327 
0.086916 
111.288 

0.69197 
0.065579 
108.668 

1.676646 
0.046087 
105.796 

1.537979 
0.033767 
103.494 

1.794632 
0.095836 
102.872 

1.879232 
0.122779 
101.896 

1.95194 
0.211929 
102.466 

1.131995 
0.581032 
103.446 

1.037893 
0.652239 
103.96 

0.993211 
1.94017 
104.672 

1.392719 
7.110902 
106.686 

1.222757 
7.016165 
85.654 

0.993264 
6.644737 
64.538 

Now let’s look at the rest of this table, which is made up of the new
10 values of ROC indicators (Table 14.3).
Added Rate of
Change (ROC) Indicators 

Date 
ROC3_3Mo 
ROC3_Bond 
ROC10_AD 
ROC3_H/L 
ROC3_SPC 
1/4/80 





1/11/80 





1/18/80 





1/25/80 





2/1/80 





2/8/80 
0.002238 
0.030482 
0.13026 
0.39625 
0.029241 
2/15/80 
0.011421 
0.044406 
0.55021 
0.96132 
0.008194 
2/22/80 
0.041716 
0.045345 
0.47202 
0.91932 
0.001776 
2/29/80 
0.0515 
0.069415 
0.358805 
0.81655 
0.00771 
3/7/80 
0.089209 
0.047347 
0.54808 
1 
0.03839 
3/14/80 
0.073273 
0.026671 
0.06859 
0.96598 
0.03814 
3/21/80 
0.038361 
0.001622 
0.15328 
0.51357 
0.04203 
3/28/80 
0.065901 
0.00748 
0.766981 
0.69879 
0.03816 
4/3/80 
0.00397 
0.005419 
0.26054 
0.437052 
0.01753 
4/11/80 
0.03377 
0.00438 
0.008981 
0.437052 
0.01753 
4/18/80 
0.0503 
0.02712 
0.23431 
0.803743 
0.001428 
4/25/80 
0.08093 
0.0498 
0.37721 
0.58831 
0.015764 
5/2/80 
0.14697 
0.04805 
0.25956 
0.795146 
0.014968 
5/9/80 
0.15721 
0.05016 
0.37625 
0.10178 
0.00612 
5/16/80 
0.17695 
0.0555 
0.127944 
0.823772 
0.016043 
5/23/80 
0.10874 
0.02701 
0.515983 
0.86112 
0.027628 
ROC10_3Mo 
ROC10_Bnd 
ROC10_AD 
ROC10_HL 
ROC10_SP 

0.15732 
0.084069 
0.502093 
0.99658 
0.04987 

0.111111 
0.091996 
0.08449 
0.96611 
0.05278 

0.087235 
0.069553 
0.268589 
0.78638 
0.04964 

0.055848 
0.030559 
0.169062 
0.84766 
0.06888 

0.002757 
0.01926 
0.06503 
0.39396 
0.04658 

0.10345 
0.0443 
0.183309 
0.468658 
0.03743 

0.17779 
0.0706 
0.127 
0.689919 
0.03041 

0.25496 
0.0996 
0.319735 
0.980756 
0.0061 

0.25757 
0.0945 
0.299569 
0.996461 
0.02229 

NOTE: Note that you don’t get completed rows until
3/28/90, since we have a ROC indicator dependent on a Block Average value 10
weeks before it. The first block average value is generated 1/1/80, two weeks after
the start of the data set. All of this indicates that you will need to discard
the first 12 values in the dataset to get complete
rows, also called complete facts.
We now have values in the original five data columns that have a very large range. We would like to reduce the range by some method. We use the following function:
new value = (old value  Mean)/ (Maximum Range)
This relates the distance from the mean for a value in a column as a fraction of the Maximum range for that column. You should note the value of the Maximum range and Mean, so that you can unnormalize the data when you get a result.
We’ve taken care of all our inputs, which number 15. The final piece of information is the target. The objective as stated at the beginning of this exercise is to predict the percentage change 10 weeks into the future. We need to time shift the S&P 500 close 10 weeks back, and then calculate the value as a percentage change as follows:
Result = 100 X ((S&P 10 weeks ahead)  (S&P this week))/(S&P this week).
This gives us a value that varies between 14.8 to and + 33.7. This is not in the form we need yet. As you recall, the output comes from a sigmoid function that is restricted to 0 to +1. We will first add 14.8 to all values and then scale them by a factor of 0.02. This will result in a scaled target that varies from 0 to 1.
scaled target = (result + 14.8) X 0.02
The final data file with the scaled target shown along with the scaled
original six columns of data is shown in table:
Normalized Ranges
for Original Columns and Scaled Target 

Date 
S_3MOBill 
S_LngBnd 
S_A/D 
3/28/80 
0.534853 
0.01616 
0.765273 
4/3/80 
0.391308 
0.055271 
0.06356 
4/11/80 
0.331578 
0.009483 
0.049635 
4/18/80 
0.273774 
0.09674 
0.03834 
4/25/80 
0.168765 
0.21396 
0.08956 
5/2/80 
0.01813 
0.2451 
0.0317 
5/9/80 
0.12025 
0.29455 
0.15503 
5/16/80 
0.22912 
0.37696 
0.006205 
5/23/80 
0.1954 
0.34583 
0.349971 
S_H/L 
S_SPC 
Result 
Scaled Target 
0.07089 
0.51328 
12.43544 
0.544709 
0.07046 
0.49236 
12.88302 
0.55366 
0.06969 
0.46901 
9.89498 
0.4939 
0.07035 
0.51513 
15.36549 
0.60331 
0.06903 
0.44951 
11.71548 
0.53031 
0.06345 
0.44353 
11.61205 
0.528241 
0.06903 
0.45577 
16.53934 
0.626787 
0.04372 
0.41833 
12.51048 
0.54621 
0.033901 
0.37179 
9.573314 
0.487466 
You need to place the first 200 lines in a training.dat file (provided for you in the accompanying diskette) and the subsequent 40 lines of data in the another test.dat file for use in testing. You will read more about this shortly. There is also more data than this provided on this diskette in raw form for you to do further experiments.
With the training data available, we set up a simulation. The number of inputs are 15, and the number of outputs is 1. A total of three layers are used with a middle layer of size 5. This number should be made as small as possible with acceptable results. The optimum sizes and number of layers can only be found by much trial and error. After each run, you can look at the error from the training set and from the test set.
You obtain the error for the test set by running the simulator in Training
mode (you need to temporarily copy the test data with expected outputs to the
training.dat file) for one cycle with weights loaded
from the weights file. Since this is the last and only cycle, weights do not get
modified, and you can get a reading of the average error. Refer to Chapter 13
for more information on the simulator’s Test and Training
modes. This approach has been taken with five runs of the simulator for 500
cycles each. Table 14.5 summarizes the results along with the parameters used.
The error gets better and better with each run up to run # 4. For run #5, the
training set error decreases, but the test set error increases, indicating the
onset of memorization. Run # 4 is used for the final network results, showing
test set RMS error of 13.9% and training set error of 6.9%.
Results of Training
the Backpropagation Simulator for Predicting the
Percentage Change in the S&P 500 Index 

Run# 
Tolerance 
Beta 
Alpha 
NF 
max cycles 
cycles run 
training set error 
test set error 
1 
0.001 
0.5 
0.001 
0.0005 
500 
500 
0.150938 
0.25429 
2 
0.001 
0.4 
0.001 
0.0005 
500 
500 
0.114948 
0.185828 
3 
0.001 
0.3 
0 
0 
500 
500 
0.0936422 
0.148541 
4 
0.001 
0.2 
0 
0 
500 
500 
0.068976 
0.139230 
5 
0.001 
0.1 
0 
0 
500 
500 
0.0621412 
0.143430 
NOTE: If you find the test set error does not decrease much,
whereas the training set error continues to make substantial progress, then
this means that memorization is starting to set in (run#5 in example). It is
important to monitor the test set(s) that you are using while you are training
to make sure that good, generalized learning is occurring versus memorizing or overfitting the data. In the case shown, the test set error
continued to improve until run#5, where the test set error degraded. You need
to revisit the 12step process to forecasting model design to make any further
improvements beyond what was achieved.
To see the exact correlation, you can copy any period you’d like, with the expected value output fields deleted, to the test.dat file. Then you run the simulator in Test mode and get the output value from the simulator for each input vector. You can then compare this with the expected value in your training set or test set.
Now that you’re done, you need to unnormalize the data back to get the answer in terms of the change in the S&P 500 index. What you’ve accomplished is a way in which you can get data from a financial newspaper, like Barron’s or Investor’s Business Daily, and feed the current week’s data into your trained neural network to get a prediction of what the S&P 500 index is likely to do ten weeks from now.
Here are the steps to unnormalize:
1. Take the predicted scaled target value and calculate, the result value as Result = (Scaled target/0.02)  14.8
2. Take the result from step 1 (which is the percentage change 10 weeks from now) and calculate the projected value, Projected S&P 10 weeks from now = (This week’s S&P value)(1+ Result/100)
This is only a very brief illustration (not meant for trading !) of what you can do with neural networks in financial forecasting. You need to further analyze the data, provide more predictive indicators, and optimize/redesign your neural network architecture to get better generalization and lower error. You need to present many, many more test cases representing different market conditions to have a robust predictor that can be traded with. A graph of the expected and predicted output for the test set and the training set is shown in figure:
Comparison of predicted versus actual for the training and test data sets.
Here, the normalized values are used for the output. Note that the error is
about 13.9% on average over the test set and 6.9% over the training set. You
can see that the test set did well in the beginning, but showed great
divergence in the last few weeks.
We cannot overstate the importance of preprocessing in developing a forecasting model. There is a large body of information related to the study of financial market behavior called Technical Analysis. You can use the mathematical studies defined by Technical Analysis to preprocess your input data to reveal predictive features. We will present a sampling of Technical Analysis studies that can be used, with formulae and graphs.
Moving averages are used very widely to capture the underlying trend of a
price move. Moving averages are simple filters that average data over a moving
window. Popular moving averages include 5, 10, and 20 period moving averages.
The formula is shown below for a simple moving average, SMA:
SMA_{t} = ( P_{t} + P_{t1} + ... P_{tn})/ n
where n = the number of time periods back
P_{n}= price at n time periods back
An
exponential moving average is a weighted moving average that places more weight
on the most recent data. The formula for this indicator, EMA is as follows:
EMA_{t} = (1  a)P_{t} + a ( EMA_{t}_{1})
where a = smoothing constant (typical 0.10)
P_{t}= price at time t
Momentum
is really velocity, or rate of price change with time. The formula for this is
M_{t} = ( P_{t}  P_{ta} )
where a = lookback parameter
for a 5day momentum value, a = 5
The
Rate of Change indicator is actually a ratio. It is the current price divided
by the price some interval, a, ago divided by a constant. Specifically,
ROC = P_{t} / P_{ta} x 1000
The
Relative Strength Index (RSI) is the strength of up closes versus down closes
over a certain time interval. It is calculated over a time interval T as :
RSI = 100  [ 100 / (1 + RS )]
where RS = average of x days’ up closes/ average of x days’ down
closes
A typical time interval, T, is 14 days. The assumption with the use of RSI is that higher values up closes relative to down closes indicates a strong market, and the opposite indicates weak markets.
This
indicator measures where in a recent range of prices today’s price falls.
The indicator assumes that prices regress to their mean. A low %R indicates
that prices are hitting the ceiling of a range, while a high %R indicates that
prices are at their low in a range. The formula is:
%R = 100 x (HighX  P)/(HighX  LowX)
where HighX is the highest price over the price interval of interest
LowX is the lowest price over the price interval of interest
P is the current price
This
indicator makes use of other market data that is available besides price
information. It uses the volume of the security, which, for a stock, is
the number of shares traded for a stock during a particular interval. It also
uses the open interest, which is the value of the total number of open
trades at a particular time. For a commodity future, this is the number of open
short and long positions. This study attempts to measure the flow of money in
and out of a market. The formula for this is as follows (note that a tick is
the smallest permissible move in a given market) :
Let MP = mean price over a particular interval
OI = the larger of yesterday’s or today’s open interest
then
K =[ (MP_{t}  MP_{t}_{1} ) x dollar value of 1 tick move x volume ]
x [1 +/ 2/OI]
HPI_{t} = HPI_{t}_{1} + [ 0.1 x (K  HPI_{t}_{1} )] / 100,000
The
MACD (moving average convergence divergence) indicator is the difference
between two moving averages, and it tells you when shortterm overbought or
oversold conditions exist in the market. The formula is as follows:
Let OSC = EMA1  EMA2,
where EMA1 is for one smoothing constant and time period, for example
0.15 and 12 weeks
EMA2 is for another smoothing constant and time period,
for example
0.075 and 26 weeks
then
MACD_{t} = MACD_{t}_{1} + K x ( OSC_{t}  MACD_{t}_{1} )
where K is a smoothing constant, for example, 0.2
The final formula effectively does another exponential smoothing on the difference of the two moving averages, for example, over a 9week period.
This
indicator has absolutely nothing to do with stochastic processes. The reason
for the name is a mystery, but the indicator is composed of two parts: %K and
%D, which is a moving average of %K. The crossover of these lines indicates
overbought and oversold areas. The formulas follow:
Raw %K = 100 x (P  LowX )/(HighX  LowX)
%K_{t} = [( %K_{t}_{1} x 2 ) + Raw %K_{t} ] /3
%D_{t} = [( %D_{t}_{1} x 2 ) + %K_{t} ] /3
The onbalance volume (OBV) indicator was created to try to uncover accumulation and distribution patterns of large player in the stock market. This is a cumulative sum of volume data, specified as follows:
If today’s close is greater than yesterday’s close OBV_{t} = OBV_{t}_{1} + 1
If today’s close is less than yesterday’s close OBV_{t} = OBV_{t}_{1}  1
The absolute value of the index is not important; attention is given only to the direction and trend.
This indicator does for price what OBV does for volume.
If today’s close is greater than yesterday’s close: AD_{t} = AD_{t}_{1} + (Close_{t}  Low_{t})
If today’s close is less than yesterday’s close: AD_{t} = AD_{t}_{1} + (High_{t}  Close_{t})
Now
let’s examine how these indicators look.
shows a bar chart, which is a chart of price data versus time, along with the following indicators:
• Tenunit moving average
• Tenunit exponential moving average
• Momentum
• MACD
• Percent R
Five minute bar chart of the S&P 500 Sept 95 Futures contract with several technical indicators displayed.
The time period shown is 5 minute bars for the S&P 500 September 1995 Futures contract. The top of each bar indicates the highest value (“high”) for that time interval, the bottom indicates the lowest value(“low”), and the horizontal lines on the bar indicate the initial (“open”) and final (“close”) values for the time interval.
Shows another bar chart for Intel Corporation stock for the period from December 1994 to July 1995, with each bar representing a day of activity. The following indicators are displayed also.
• Rate of Change
• Relative Strength
• Stochastics
• AccumulationDistribution
Daily bar chart of Intel Corporation with several technical indicators displayed.
You have seen a few of the hundreds of technical indicators that have been invented to date. New indicators are being created rapidly as the field of Technical Analysis gains popularity and following. There are also pattern recognition studies, such as formations that resemble flags or pennants as well as more exotic types of studies, like Elliot wave counts. You can refer to books on Technical Analysis (e.g., Murphy) for more information about these and other studies.
Neural preprocessing with Technical Analysis tools as well as with traditional engineering analysis tools such as Fourier series, Wavelets, and Fractals can be very useful in finding predictive patterns for forecasting.
In this final section of the chapter, we outline some case studies documented in periodicals and books, to give you an idea of the successes or failures to date with neural networks in financial forecasting. Keep in mind that the very best (= most profitable) results are usually never reported (so as not to lose a competitive edge) ! Also, remember that the market inefficiencies exploited yesterday may no longer be the same to exploit today.
Well, Hillary Clinton can certainly trade commodities, but a threeyearold, too? In his paper, “Commodity Trading with a Three Year Old,” J. E. Collard describes a neural network with the supposed intelligence of a threeyearold. The application used a feedforward backpropagation network with a 37301 architecture. The network was trained to buy (“go long”) or sell (“go short”) in the live cattle commodity futures market. The training set consisted of 789 facts for trading days in 1988, 1989, 1990, and 1991. Each input vector consisted of 18 fundamental indicators and six market technical variables (Open, High, Low, Close, Open Interest, Volume). The network could be trained for the correct output on all but 11 of the 789 facts.
The fully trained network was used on 178 subsequent trading days in 1991. The cumulative profit increased from $0 to $1547.50 over this period by trading one live cattle contract. The largest loss in a trade was $601.74 and the largest gain in a trade was $648.30.
Milam Aiken designed a feedforward backpropagation network that predicted Treasury Bill Rates and compared the forecast he obtained with forecasts made by top U.S. economists. The results showed the neural network, given the same data, made better predictions (.18 versus .71 absolute error). Aiken used 250 economic data series to see correlation to TBills and used only the series that showed leading correlation: Dept. of Commerce Index of Leading Economic Indicators, the Center for International Business Cycle Research (CIBCR) Short Leading Composite Index, and the CIBCR Long Leading Composite Index. Prior data for these three indicators for the past four years (total 12 inputs) was used to predict the average annual TBill rate (one output) for the current year.
Guido Deboeck and Masud Cader designed profitable trading systems for twoyear and 10year treasury securities. They used feedforward neural networks with a learning algorithm called extendeddeltabardelta (EDBD), which is a variant of backpropagation. Training samples composed of 100 facts were selected from 1120 trading days spanning from July 1 1989 to June 30, 1992. The test period consisted of more than 150 trading days from July 1, 1992 to December 30, 1992. Performance on the test set was monitored every N thousand training cycles, and the training procedure was stopped when performance degraded on the test set. (This is the same procedure we used when developing a model for the S&P 500.)
A criterion used to judge model performance was the ratio of the average profit divided by the maximum drawdown, which is the largest unrealized loss seen during the trading period. A portfolio of separate designed trading systems for twoyear and 10year securities gave the following performance: Over a period of 4.5 years, the portfolio had 133 total trades with 65% profitable trades and the maximum drawdown of 64 basis points, or thousands of units for bond yields. The total gain was 677 basis points over that period with a maximum gain in one trade of 52 basis points and maximum loss in one trade of 47 basis points.
The stability and robustness of this system was checked by using over 1000 moving time windows of 3month, 6month, and 12month duration over the 4.5year interval and noting the standard deviations in profits and maximum drawdown. The maximum drawdown varied from 30 to 48 basis points.
Ramesh Sharda and Rajendra Patil used a standard 12121 feedforward backpropagation network and compared the results with BoxJenkins methodology for timeseries forecasting. BoxJenkins forecasting is traditional timeseries forecasting technique. The authors used 75 different time series for evaluation. The results showed that neural networks achieved better MAPE (mean absolute percentage error) with a mean over all 75 time series MAPEs of 14.67 versus 15.94 for the BoxJenkins approach.
Leorey Marquez et al. compared neural network modeling with standard regression analysis. The authors used a feedforward backpropagation network with a structure of 161. They used three functional forms found in regression analysis:
1. Y = B0 + B1 X + e
2. Y = B0 + B1 log(X) + e
3. Y = B0 + B1/X + e
For each of these forms, 100 pairs of (x,y) data were generated for this “true” model.
Now the neural network was trained on these 100 pairs of data. An additional 100 data points were generated by the network to test the forecasting ability of the network. The results showed that the neural network achieved a MAPE within 0.6% of the true model, which is a very good result. The neural network model approximated the linear model best. An experiment was also done with intentional misspecification of some data points. The neural network model did well in these cases also, but comparatively worse for the reciprocal model case.
Here five neural networks are arranged such that four network outputs feed that final network. The four networks are trained to produce the High, Low, shortterm trend, and mediumterm trend for a particular financial instrument. The final network takes these four outputs as input and produces a turning point indicator.
Hierarchical neural network system to predict turning points.
Each network was trained and tested with 1200 fact days spanning 1988 to 1992 (33% used for testing). Preprocessing was accomplished by using differences of the inputs and with some technical analysis studies:
• Moving averages
• Exponential moving averages
• Stochastic indicators
For the network that produces a predicted High value, the average error ranged between 7.04% and 7.65% for various financial markets over the test period, including Treasury Bonds, Eurodollar, Japanese Yen, and S&P 500 futures contracts.
A methodology that is sometimes used in neural network design is walkforward training and testing. This means that you choose an interval of time (e.g., six months) over which you train a neural network and test the network over a subsequent interval. You then move the training window and testing window forward one month, for example, and repeat the exercise. You do this for the time period of interest to see your forecasting results. The advantage of this approach is that you maximize the network’s ability to model the recent past in making a prediction. The disadvantage is that the network forget characteristics of the market that happened prior to the training window.
Takashi Kimoto et al. used the walk forward methodology in designing a trading system for Fujitsu and Nikko Securities. They also, like Mendelsohn, use a hierarchical neural network composed of individual feedforward neural networks. Prediction of the TOPIX, which is the Japanese equivalent of the Dow Jones Industrial Average, was performed for 33 months from January 1987 to September 1980. Four networks were used in the first level of the hierarchy trained on price data and economic data. The results were fed to a final network that generated buy and sell signals. The performance of the trading system achieved a result that is 20% better than a buy and hold strategy for the TOPIX.
Jeremy Konstenius, discusses a trading system for the S&P 400 index with a holographic neural network, which is unlike the feedforward backpropagation neural network. The holographic network uses complex numbers for data input and output from neurons, which are mathematically more complex than feedforward network neurons. The author uses two trained networks to forecast the next day’s direction based on data for the past 10 days. Each network uses input data that is detrended, by subtracting a moving average from the data. Network 1 uses detrended closing values. Network 2 uses detrended High values. If both networks agree, or confirm each other, then a trade is made. There is no trade otherwise.
Network 1 showed an accuracy of 61.9% for the fivemonth test period (the training period spanned two years prior to the test period), while Network 2 also showed an accuracy of 61.9%. Using the two networks together, Konstenius achieved an accuracy of 65.82%.
This neural network approach is discussed by Michitaka Kosaka et al. (1991).
They discuss applying the feedforward backpropagation network to develop buy/sell signals for securities. You would gather timeseries data on stock prices, and want to find trends in the data so that changes in the direction of the trend provide you the turning points, which you interpret as signals to buy or sell.
You will need to list these factors that you think have any influence on the price of a security you are studying. You need to also determine how you measure these factors. You then formulate a nonlinear function combining the factors on your list and the past however many prices of your security (your time series data).
The
function has the form, as Michitaka Kosaka, et al. (1991) put it,
p(t + h) = F(x(t), x(t 1), ... , f_{1}, f_{2}, ... )
where
f_{1}, f_{2}, represent factors on your list,
x(t) is the price of your stock at time t,
p(t + h) is the turning point of security price at time t + h, and
p(t + h) = 1 for a turn from downward to upward,
p(t + h) = +1 for a turn from upward to downward,
p(t + h) = 0 for no change and therefore no turn
Here you vary h through the values 1, 2, etc. as you move into the future one day (time period) at a time. Note that the detailed form of the function F is not given. This is for you to set up as you see fit.
You can set up a similar function for x(t + h), the stock price at time t + h, and have a separate network computing it using the backpropagation paradigm. You will then be generating future prices of the stock and the future buy/sell signals hand in hand, but parallel.
Michitaka Kosaka, et al. (1991) report that they used timeseries data over five years to identify the network model, and timeseries data over one year to evaluate the model’s forecasting performance, with a success rate of 65% for turning points.
Michael Azoff in his book on timeseries forecasting with neural networks (see references) creates neural network systems for predicting the S&P 500 index as well as for predicting chaotic time series, such as sunspot occurrences. Azoff uses feedforward backpropagation networks, with a training algorithm called adaptive steepest descent, a variation of the standard algorithm. For the sunspot time series, and an architecture of 651, and a ratio of training vectors to trainable weights of 5.1, he achieves training set error of 12.9% and test set error of 21.4%. This series was composed of yearly sunspot numbers for the years 1706 to 1914. Six years of consecutive annual data were input to the network.
One network Azoff used to forecast the S&P 500 index was a 1771 network. The training vectors to trainable weights ratio was 6.1. The achieved training set error was 3.29%, and on the test set error was 4.67%. Inputs to this network included price data, a volatility indicator, which is a function of the range of price movement, and a random walk indicator, a technical analysis study.
Michael de la Maza and Deniz Yuret, managers for the Redfire Capital Management Group, suggest that riskadjusted return, and not meansquared error should be the metric to optimize in a neural network application for trading. They also point out that with neural networks, like with statistical methods such as linear regression, data facts that seem unexplainable can’t be ignored even if you want them to be. There is no equivalent for a “don’t care,” condition for the output of a neural network. This type of condition may be an important option for trading environments that have no “discoverable regularity” as the authors put it, and therefore are really not tradable. Some solutions to the two problems posed are given as follows:
• Use an algorithm other than backpropagation, which allows for maximization of riskadjusted return, such as simulated annealing or genetic algorithms.
• Transform the data input to the network so that minimizing meansquared error becomes equivalent to maximizing riskadjusted return.
• Use a hierarchy (see hierarchical neural network earlier in this section) of neural networks, with each network responsible for detecting features or regularities from one component of the data.
Here is a sampling of resources compiled from trade literature:
NOTE: We do not take responsibility for any errors or omissions.
This chapter presented a neural network application in financial forecasting. As an example of the steps needed to develop a neural network forecasting model, the change in the Standard & Poor’s 500 stock index was predicted 10 weeks out based on weekly data for five indicators. Some examples of preprocessing of data for the network were shown as well as issues in training.
At the end of the training period, it was seen that memorization was taking place, since the error in the test data degraded, whereas the error in the training set improved. It is important to monitor the error in the test data (without weight changes) while you are training to ensure that generalization ability is maintained. The final network resulted in average RMS error of 6.9 % over the training set and 13.9% error over the test set.
This chapter’s example in forecasting highlights the ease of use and wide applicability of the backpropagation algorithm for large, complex problems and data sets. Several examples of research in financial forecasting were presented with a number of ideas and reallife methodologies presented.
Technical Analysis was briefly discussed with examples of studies that can be useful in preprocessing data for neural networks.
A Resource guide was presented for further information on financial applications of neural networks.