Uncertainty Analysis of Monthly Streamflow Forecasting

ABStrAct Streamflow forecasting is an important factor in water resources planning and management. In this study Feed Forward Artificial Neural Network (FFANN) was used for monthly streamflow forecasting. Three scenarios were considered for modeling. Principal Component Analysis (PCA) is used for reducing the model architecture complexity and input data reduction. Twelve statistical criteria were used to evaluate the model performance. Also for quantifying the accuracy of forecast, uncertainty analysis was conducted using Monte Carlo simulation. Results indicated that the model in general is capable to forecast monthly streamflow time series satisfactorily. However the model is underestimated in extreme values. Also, uncertainty analysis shows that the model forecasted monthly streamflow time series properly in the first two scenarios while in the third scenario most of the forecasted values lie out of the upper confidence interval.

Streamflow forecasting is a key component in sustained development and based on environmental issues.It has been an important subject for the researchers from the middle of the 20th century.Different approaches such as regression ( (Kalteh, 2013).ANNs are suitable for dealing with the intrinsic characteristics commonly present in hydrological processes (Fajardo Toro, 2013).ANN is appropriate for the problems which the input is high dimensional, data are possibly noisy and not important to know the weights.Literatures in the last two decades show a high interest in using ANN for hydrological processes, forecasting and different ANN architectures were used for this purpose.Most studies have been done by feedforward error backpropagation (Karunanithi et al., 1994;Kisi, 2004).The standard backpropagation algorithm (SBPA) has some problems including very low speed training convergence and easy entrapment in a local minimum (Haykin, 1999).The Levenberg-Marquite algorithm proposed as a training function to overcome these problems.
One of the problems in ANN planning is presence of the complex structures which lead to networks with heavy architecture.In this regard, Coulibaly et al. (2000) utilized Stop Training Algorithm (STA) to solve this problem.It is possible to find several effective factors which cause networks with simple architecture.Input selection is a crucial step in ANN implementation.The lack of pertinent input impairs the network application to map the input into a close estimate of the observed streamflow.If the number of weights in ANNs is more than of samples in the training of ANNs to some extent, "over fitting" may be caused (Haykin, 1999).In the case of a high number of input variables, the probability of correlation between the input variables increase and ANN hardly can find the optimized models.Therefore, if possible, it recommends reducing input variables even though this causes some of the information omitted.Principal Component Analysis (PCA) is a proper method for data reduction (Dehghani et al. 2014;Noori et al. 2011).PCA has been used widely in different environmental issues.
Forecasting is associated with uncertainty.It means that the forecasted values will not be happen exactly all the time and they oscillate around the predicted values.So investigating the uncertainty associated with the forecasted values is an important issue in environmental processes forecasting.Different methods were used for uncertainty analysis in the past decades (Dehghani et al. 2014;Zhao et al. 2011;Viola et al. 2009).Monte Carlo simulation is one of the most popular methods in uncertainty analysis.Uncertainty analysis and assigning confidence intervals enable the water resources decision makers to have a better understanding of water resources in early future and make decisions based on this information.
In this paper by considering the above explanation, monthly streamflow forecasted via ANN.Also the Monte Carlo simulation method is used to investigate the uncertainty of forecasted values.In sections 2 and 3, the study area and methodology are described, respectively.Model performance and discussions are presented in section 4 and the conclusions are drawn in section 5.

Study area and data
The great Karun Basin is located in southwest of Iran (Fig. 1).The basin covers an area of 67112 km 2 at the mouth of the Persian gulf.This basin produces over 25% of total surface water resources in Iran and significantly affects the agricultural, social and environmental aspects of human life in this region.
Based on the high surface water potential in the basin that supplies water to various users and generates hydropower, hydrologic studies and streamflow forecasting are vital to efficient water planning and management.This study focuses on Dez River subbasin within the great Karun.Fig. 2 shows the study area and the hydrometric station network in the selected area.The reason for selecting the tributaries of the Dez network was because the data at some downstream stations may have been affected by upstream water withdrawals.However, water use is negligible in the tributary rivers.As a result, part of the Dez river system up to the Sepiddasht hydrometric station was designated as the study area.
A total of seven hydrometric stations were studied in this research.Referring to Fig. 2, the stations are Rahimabad, Dorudtire, Sepiddasht, Chamchit, Moruk, Daretakht and Dorudmarbere.All the stations have data from 1955 to 2009 for a total of 648 months streamflow data.Table 1 represents the monthly streamflow statistics for all hydrometric stations.The flow coefficient of variation oscillates between 1 and 1.95.This is a typical characteristic of streamflow in basins of Mediterranean climate that makes the forecast a challenging task.

Methodology Artificial neural networks
ANN customary architecture is composed of three layers of neurons: input layer, hidden layer and output layer (Haykin, 1999).A neuron response is based on the weighted sum of all its inputs according to an activation function.A feed-forward network was adopted for this study since feed-forward ANN has been shown to have a computational superiority in comparison to other paradigms (Hornik et al., 1989).The network was trained by the back-propagation algorithm through the split-validation procedure.Available data was divided into three sets: a training set, a validation set, and a test set.The training set is used to fit ANN model weights, the validation to select the model variant that provides the best level of generalization, and the test set is used to evaluate the chosen model against the remaining data.The number of neurons between 2 to 6 was chosen by trial and error.All input and output variables were standardized to [0.1, 0.9] scale as follows (Rajurkar et al., 2004): where X is input variable, X min and X max are the minimum and maximum values of input variable and X n is the standard value.
The total number of weights to be determined in a neural network is, for one hidden layer.This essentially accounts for all the connections between neurons in the layers.The number of neurons in the hidden layers increases the amounts of connections and weights to be fitted.This number cannot be increased without limit because one may reach a situation where the number of the connections to be fitted is larger than the number of the data pairs available for training.Although the neural network can still be trained, the case is mathematically undetermined.Mathematically, it is not possible to determine more fitting parameters than the available data points.
In this study a model based on a feedforward neural network with a single hidden layer is used.The back propagation (BP) algorithm is used to train the network.The BP algorithm is essentially a gradient descent technique that minimizes the network error function (Haykin, 1999).

Principal component Analysis
Principal Component Analysis (PCA) is a method to identify the pattern in the data.This is a powerful tool to reduce the high dimensionality of data, especially when the datasets are highly correlated.Input variables are changed into PCs that are independent i.e. the information of input variables are presented with minimum losses in PCs (Helena et al., 2000;Noori et al., 2011).PCs specified by the equation below.
Where Z i represents PCs, a i is related eigen vector and X i are also input variables.This information achieved by solving equation ( 3

Model evaluation
As there is no single evaluation criterion, it is important to apply a multi-criteria assessment of ANN skill (Dawson et

Uncertainty analysis
In order to determine the uncertainty in Streamflow forecast, ANN modeling procedure was implemented in a Monte-Carlo framework as introduced by Marce et al. (2004).Monte-Carlo simulation involves repeated generation of random parameters from their probability distributions, and then computing the statistics of the output.In this research Bootstrapping was used for resampling.The input database randomly resampled without replacement 1000 times, maintaining the ratio between the calibration (training and validation) and test sets.The 95% confidence interval of estimation is reported here due to the fact that this confidence interval provides more information than other statistical values about the range of predictions associated with the model (Noori et al., 2010c) The 95% confidence intervals are determined by finding the 2.5th and 97.5th percentiles of the constructed distribution (Noori et al., 2009).

reSUltS And diScUSSion
For Streamflow forecasting, three scenarios were considered (table 2).In the first scenario, monthly streamflow forecasted using Rahimabad and Moruk streamflow as input.In the second scenario using all hydrometric stations upstream of Sepiddasht station, the streamflow forecasted in Sepiddasht station.
In the third scenario, PCA was applied to the inputs in the second scenario to reduce the high dimensionality of data.Results indicated that the first PC reproduces 84% of variance of data.So, the first PC was selected as the input in the third scenario.
For ANN modeling, streamflow time series divided into three parts.The last 120 months river discharge assigned for test, 100 months for validation and the rest of the data for training then the model applied to the time series.Figs. 3 to  For more investigation, 12 statistical criteria are calculated for the test phase (table 4).The bold and italic values show better performance.
The first seven criteria are about modeling error estimation.Based on these criteria the model had the best performance at Dorudtire station.For a perfect model these seven metrics would be zero.RAE comprises the total absolute error made relative to what the total absolute error would have been if the forecast had simply been the mean of the observed values (Dawson et al. 2007).RAE value is better in the first and second scenarios.The four remaining metrics, including R, IoAd, CE and PI have the best values for first scenario among the other scenarios.A practical way of quantifying the accuracy of the forecast is by estimating the confidence interval of prediction.The wider the interval, the smaller is the accuracy of the forecast and vice versa.The Monte Carlo simulation was conducted for setting upper and lower confidence bands for streamflow forecasting in different scenarios.Results of 95% confidence intervals are shown in figures 6 to 8.
Results indicated that in the first scenario, all forecasted values lie within the confidence intervals.It can be conducted that ANN performed satisfactorily in forecasting monthly streamflow in the first scenario.Also, all the forecasted values in second scenario lie within the confidence intervals while in the third scenario a large number of forecasted values lie out of confidence intervals.75% of forecasted values in third scenario lie out of confidence intervals which show the model performed poorly in forecasting streanflow.Most of the forecasted values are out of upper bound which shows that the model is not capable to predict the upper band properly.

conclUSion
In this study by using ANN, monthly streamflow was forecasted in three scenarios in Karin basin in Iran.Also uncertainty analysis was conducted to predict the confidence intervals.Results indicated that the model is capable to forecast monthly streamflow satisfactorily although in some cases the model is over/underestimated.However there are some considerations.Base on the statistical criteria the model performed well in the first and second scenarios while the model performance is poor in the third scenario.It can be concluded that the model is sensitive to the quality of input and more information leads to better performance.So by using PCs as input, the model will lose some information and the model performance will be worse than the other scenarios.In reverse, using PC(s) as input decreases the model complexity.The difference between first and second scenarios may due to the water withdrawal upstream of Seppiddasht hydrometric station.Beside the statistical criteria, uncertainty analysis provides a good evaluation of streamflow forecasting.Monte Carlo simulation which is used in this research is a powerful tool for uncertainty analysis and performed well in the confidence interval prediction.

Fig. 7 :Fig. 8 :
Fig. 7: confidence intervals for Streamflow forecast at the second scenario Sun et al. 2014; Rehman and Saleem, 2014, Dehghani et al. 2014), conceptual (Jain and Srinivasulu, 2006; Xu et al. 1996) and intelligent (He et al. 2014; Liu et al. 2014; Sudheer et al. 2014) models are used for streamflow forecasting.Artificial intelligence models, especially Artificial Neural Networks (ANNs) have been applied for streamflow forecasting in several researches.Artificial Neural Network (ANN) is a nonlinear black-box statistical approach

table . 4: Statistical criteria corresponding to test phase of monthly streamflow forecast in various scenarios
5shows the ANN modeling of streamflow in test phase.From these figures it can be obtained that the model had a suitable performance in the test phase especially for Dorudtire station.However the ANN model is underestimating especially in extreme values.The mean, minimum and maximum observed values and forecasted values in the test phase are presented in table3Results indicated that the model is underestimated in maximum and mean values while in minimum value the model is overestimated.In general the model performed better at Dorudtire station.The model follows the observed time series pattern properly in all scenarios.This