PABLO F. VERDES
Instituto de Fisica Rosario 
Bv 27 de Febrero 210 Bis 
2000 Rosario - Argentina 
TE  (54 341) 4853222 ext 123 
FAX  (54 341) 4853222 ext 128 
E-MAIL  verdes[AT]ifir.edu.ar



Neural Networks and Time-Series Prediction

Neural networks are parallel computational structures of highly interconnected simple processors -called neurons-, which simulate to some extent the structure and functioning of the brain (Rumelhart and McClelland, 1986). In particular, we are mainly interested in the so called "feedforward" neural networks. These architectures have a group of neurons -the input layer-, which are fed by external stimuli. The input units send these stimuli to hidden neurons (not connected to the environment), grouped in one or more internal layers. These hidden units process the information they receive, and pass their results to the last group of neurons -the output layer. Neurons in the output layer produce the final response to the external stimuli.

The units are connected through information channels, whose strengths ("weights") have to be determined to properly relate inputs to desired outputs. This process is known as network training, and the first algorithm which performs the weights adjustment is called "backpropagation rule" (Rumelhart, Hinton, and Williams, 1986). It essentially consists of a gradient-descent algorithm to reduce the error between actual and desired network outputs, which modifies weights going backward from the output-layer to the input-layer connections. The computations carried out inside each neuron amout to: i) performing the weighted average of its impinging inputs, ii) sending this average through a (biased) sigmoid function, and iii) forwarding the sigmoid function output to the next layer of neurons. These calculations are generally performed syncronously by all the neurons in a given layer, so that the stimulus response delay depends on the number of layers. Despite the simplicity of the calculations involved, a feedforward neural network with enough number of hidden units can perform any arbitrary mapping of n-dimensional inputs to m-dimensional outputs (Cybenko, 1988, 1989). Although in general the optimal network architecture able to perform this task is not known, the nature of the problem often gives hints in this sense.

Typically, solving a problem by using neural networks requires: a) choosing a suitable network architecture (number of layers and neurons in each layer), b) selecting a large, representative set of training patterns (input-output pairs carrying enough information on the data set), c) training the network to relate the inputs to the corresponding outputs, by modifying weights according to the backpropagation rule. As mentioned above, this corresponds to minimizing a cuadratic cost function error between the desired outputs ("targets") and the actual outputs produced by the network in response to the inputs. Every pair (input,target) has to be presented many times to the network to (aproximately) learn the mapping. The length of the training process is usually measured in terms of single presentations of the whole training set ("epochs"). A frequent pitfall in this part of the process is getting trapped in high local minima of the error function.

A succesful training experiment generally allows the network to capture the essential relationships between inputs and outputs. In such cases, the trained net shows remarkable generalization capabilities, being able to correctly relate inputs to outputs not included in the training set. However, the architecture used is crucial: smaller-than-needed nets do not learn the examples, while bigger-than-required ones usually overfit the data, learning undesirable (noisy) features which degrade its generalization performance. In the later case, a way to cope with the problem is keeping a small set of input-output examples without being presented to the network. The network's generalization performance can then be monitored on this set while learning the training set. This is called "cross-validation".

Among the applications of neural networks, the modeling of highly complex systems stands out particularly, with the purpose of the analysis and prediction of the behavior of such systems (Navone, 1994). In general these studies are made on the basis of historical records of variables associated to the system, organized in sequences commonly denominated time series.

The time series constitute therefore a special assembly of data, in where the last values have influence on the future values. This depends on the intrinsic dynamics of the phenomenon represented by the time series, that governs the its succession with certain degree of determinism. This dynamics yields series characterized by nonstationary tendencies, cycles and behaviors. The predictive models must be able to give account of these characteristics and, consequently, they must reproduce the present nonlinearities in the phenomenon under study. The linear regressive techniques, basis of the traditional predictive statistical models, cannot characterize the intrinsic nonlinearity of most time series associated to the behavior of complex systems (Casdagli, 1992). Neural networks, however, allow the development of nonlinear models through training processes that extract and represent dynamics.

Although "restricted" techniques of nonlinear of regression have been developed (Tong, 1990), the "unrestricted" nonlinear properties of neural networks make them a privileged methodology in the prediction of time series associated to chaotic systems. Placing them in the context of traditional statistical methods, neural networks can be considered a "multivariate technique of nonparametric and nonlinear inference, that is made from the data and with no need to establish a previous model" (Azoff, 1994). When saying that this technique is multivariate it makes reference to the fact that input data for the network can involve different variables, whose causal interdependencies and mutual influencies are operated in the prediction of future behavior of the time sequence. By saying that it is a nonparametric technique, independent of a previous model, it means that it is not necessary to make presumptions with respect to the relations under study.

References

Rumelhart D E, McClelland J L (1986) Parallel distributed processing, MIT Press, London.

Rumelhart D E, Hinton G E, Williams R J (1986) Learning representations by backpropagating errors, Nature 323: 533-536

Cybenko G (1988) Continuous valued neural network with two hidden layers are sufficient, Report, Department of Computer Science, Tufts University, Medford

Cybenko G (1989) Approximations by superpositions of a sigmoidal function, Math Control Signal Systems 2: 303-314

Navone H D and Ceccatto H A (1994) Predicting Indian Monsoon Rainfall: A Neural Network Approach, Climate Dynamics 10, 305

Casdagli M and Eubank S eds. (1992) Nonlinear Modeling and Forecasting, Santa Fe Institute Proceedings vol XII, Addison-Wesley, Redwood City, California

Lapedes A and Farber R. (1987) Nonlinear Signal Processing Using Neural Networks, Prediction and System Modeling, Los Alamos Report LA-UR-87-2662, Los Alamos National Laboratory, Los Alamos, NM

Tong H (1990) Nonlinear Time Series: A Dynamical System Approach, Oxford University Press, Oxford

Azoff M (1994) Neural network time series forecasting of financial markets, John Wiley & Sons, England


Home | About me | Interests | Publications | Journals | Links related to TSA | My bookmarks | Sudoku