Estimating Inside Air Temperature of a Glasshouse Using Statistical Models

The efficiency of applying linear regression (LR) and artificial neural network (ANN) models to estimate inside air temperature (T) of a glasshouse (37048 ́20 ́ ́N, 23057 ́48 ́ ́E), Lavreotiki, was investigated in the present work. The T data from an urban meteorological station (MS) at 37058 ́55 ́ ́N, 23032 ́14 ́ ́E, Athens, Attica, Greece, about 30 Km away from the glasshouse, were used as predictor variable, taking into account the actual time of measurement (ATM) and two hours earlier (ATM-2), depending on the case. Air temperature data were monitored in each examined area (glasshouse and MS) for four successive months (July-October) and averages on a two-hour basis were used for the aforementioned estimation. Results showed that ANN were better than LR models, considering their better performance as shown in the scatterplots of the distribution of observed versus estimated inside T data of the glasshouse, in terms of both higher coefficient of determination (R2) and lower mean absolute error (MAE). The best ANN model (highest R2 and lowest MAE) was achieved by using as predictor variables the T at ATM and the T at ATM-2 from MS. The findings of our study may be a first step towards the estimation of inside T of a glasshouse in Greece, from outside T data of a remote MS. Thus, the operation of the glasshouse could be improved noticeably.


INTRODUCTION
The growth of plants inside glasshouses is often necessary in order to create marketable plant products out of season.Glasshouses are entirely or partially closed constructions, in which there is manual or automatic control and regulation of the values of meteorological parameters for fulfilling the requirements of the cultivated plant species 1 .One of the key factors that impacts glasshouse plant production is inside air temperature (T) 2 .The estimation of this parameter has been reported to be of high importance to help growers to manage crop production and designers to improve the ventilation and heating systems 3 .
Therefore, from both a theoretical and a practical point of view, a lot of attempts have been made to estimate the inside T of a glasshouse, presenting a variety of statistical models, for example, linear auto regressive models with external input (ARX) and auto regressive moving average models with external input 4 , ARX combined with neural network architectures 5 , partial least-square regression and back-propagation neural network 6 , with higher or lower degree of accuracy.
One robust computational technique, the artificial neural network (ANN) model 7 , can be used successfully for the estimation of inside T of a glasshouse, as it has been confirmed by several studies, e.g.Salazar et al. 8 and Alipour and Loghavi 9 .This model coincides with a great potential of complex, non-linear and time-varying input-output mapping 10 .The use of ANNs for the estimation of inside T of glasshouses is limited, especially when using the outside T of a remote meteorological station (MS) as a predictor variable.To our knowledge, only some preliminary tests (unpublished data) have been made by the authors of the present work.
Our present work aims to investigate the hypothesis of satisfactory performance of ANNs, regarding the estimation of inside T of a glasshouse, based on outside T data of a remote MS.Moreover, the performance of selected linear regression (LR) models was evaluated and compared to that of the ANNs.In both cases, we used the scatterplots of the distribution of observed versus estimated inside T data, along with the coefficient of determination (R 2 ) and the mean absolute error (MAE) to evaluate the results.

MATERIALS AND METHODS
The field experiment was conducted in two places.The first place was a MS (37 0 58´55´´N, 23 0 32´14´´E) in the highly populated urban region of the municipality of Athens, and the second one was a glasshouse (37 0 48´20´´N, 23 0 57´48´´E) in the municipality of Lavreotiki, in the prefecture of Attica of southeast continental Greece.There was one examined site (S1) at the MS and one examined site (S2) inside the glasshouse, about 30 Km away from S1.The S2 site was located in a non-shaded plot, where there was an ornamental plant cultivation according to Matsoukis et al 11 .In brief, this plot had an open vertical side, with the other three vertical sides being covered by white opaque plastic sheets of polypropylene (model Velliflor of Vellis A.E. company, Greece).The same type of sheet was used to cover the ground surface for the prevention of weed emergence.
Air temperature data at S2 were monitored every 10 min, by a sensor (model 809 L 0-100, Wilh.Lambrecht, GmbH, Germany; accuracy ±0.3 o at 0 o ) and recorded by a datalogger (model 903; Wilh.Lambrecht, GmbH, Germany).The sensor was placed at a height equal to the top of the plant canopies, for four successive months (July-October).
Air temperature averages were calculated on a twohour basis.Simultaneously, and on the same time basis, T averages were calculated by the recorded T data of the MS 12 , for the same period.The T averages of both glasshouse (S2) and MS (S1) were used for the estimation of the inside T of the glasshouse, with the aid of simple linear (SLR) and multiple linear regression (MLR) models, as well as ANN models, as determined by many preliminary tests.Finally, to estimate the inside T at S2, based on the data from S1, four models were distinguished from the others, in terms of higher R 2 and lower MAE.These models were named A, B, C and D. In model A, a SLR analysis was used, while in model B, a MLR analysis was adopted.Regarding models C and D, custom multilayer perceptrons (MLPs), which belong to the most commonly used ANN architectures 13 , were used.
More specifically, the SLR analysis is defined by the equation: where y is the dependent variable, x the independent variable, and a, b the Y-axis intercept and the slope, respectively.In model A, the inside T of the glasshouse (dependent variable) was estimated, using the T of the MS, as independent variable, for the actual time of measurement (ATM).The MLR analysis is defined by the equation: where x 1 , x 2 ,…,x n : independent variables, α: Y-axis intercept and β 1 , β 2 ,…,β n : regression coefficients.Each regression coefficient represents the contribution of the respective independent variable to the prediction of the dependent variable.In model B, the T at site S2 (dependent variable) was estimated, using the T of S1 (first independent variable), at actual time of measurement (ATM), and the T of S1 two hours earlier (ATM-2), as second independent variable.One of the most commonly used ANN models for T estimation, MLP, was chosen to be used for the present study.A major consideration when using MLPs for model building is the determination of the optimal architecture of the network, that is, the number of inputs, number of layers and number of nodes per layer.To solve this problem, a trial-anderror method was used, the most common strategy to test many various alternative models, in order to keep the best performing network.It was found that for model C the best architecture was 1-6-1, that is, an input layer of 1 unit, (T of S1 at ATM) a hidden layer of 6 units and an output layer of 1 unit (estimated T of S2).In a similar way, for model D the best architecture was found to be 2-7-1, that is, an input layer of 2 units (T of S1 at ATM and T of S1 at ATM-2), a hidden layer of 7 units and an output layer of 1 unit (estimated T of S2, in this case).For both ANNs, the connections between the layers were feedforwarded and their weights and thresholds were determined by the training procedure of the neural network.The training set consisted of half of the data, the selection set of a quarter of the data and the test set of the remaining quarter of the data, randomly assigned 14 .After the proper training of the networks, we took into consideration only the test data set to determine the testing parameters (R 2 , MAE) and compare the estimation models.
In order to evaluate the performance of the results obtained by LR and MLP models, two widely used criteria were used; the R 2 between observed and estimated T values at site S2 and the MAE of the estimated T values.The MAE is the average of the absolute errors after each model was applied and it is, side by side with R 2 , a way to examine the overall efficiency of the models.The more efficient the model is, the higher the R 2 is and the lower the MAE is, which is the desirable goal.Examining the statistical significance of the parameters of LRs and MLPs, special attention needs to be given in the output P value.This value plays a major role in determining the parameters, which will be used in the various models so as to eliminate deficiencies.In the present study, it was ensured that the results were significant at P<0.05.

RESULTS AND DISCUSSION
The results of the application of A, B, C and D models, in terms of scatterplots of the distribution of observed versus estimated T data for the site S2, are shown in Figures 1 and 2. The model with the worst performance was the simplest one, model A, derived from SLR analysis, due to the lowest R 2 and highest MAE (Figure 1a), compared to the other applied models.The model B, based on MLR analysis, showed a better performance than the previous one (A), and this improvement was justified by the higher R 2 and the lower MAE (Figure 1b).The additional input of the T of S1 at the time ATM-2 was the critical factor for this improvement.Similar results have been reported by Chronopoulos et al. 15 with regard to the estimation of T in a canyon in a National Forest in Greece.The model C, based on MLP with one input parameter (T of S1 at ATM), as it can be seen in Figure 2a, showed a slightly  2b).It has been reported that the introduction of ATM as an input in ANN models, produces better results in an urban area, concerning the estimation of T. 13 The ability of the ANNs to take into account the nonlinear characteristics of the T data, produced better results than the LRs models, especially when we used combined input of the same data with a time lag of two hours.This clear improvement when using time lag can possibly be explained by the distance of the two sites, combined with the influence of the terrain profile.It should be noted that the terrain profile results in different sunrise and sunset times which vary more than one hour at the two examined sites.
In conclusion, the analysis of the results after the application of SLR, MLR and MLP models, with or without time lag, clearly showed that the MLP models were better than the SLR and MLR models, considering their better performance based on the scatterplots of the distribution of observed versus estimated inside T data for the glasshouse, the higher R 2 and the lower MAE.From the two examined MLP models, model D, in which the input parameters were T at ATM and T at ATM-2, had better performance based on the aforementioned scatterplots, the highest R 2 and the lowest MAE.Therefore, the examined MLP model with time lag of minus two hours could be beneficial, as a first step, for the estimation of inside T of a glasshouse in Greece, using the outside T of a remote MS.This estimation could be a valuable tool for more efficient operation of the glasshouse.

Fig. 1 :
Fig. 1: Scatterplots of observed versus estimated air temperature ( o ) data for models A (a) and B (b). R 2 : determination coefficient, MAE: mean absolute error

Fig. 2 :
Fig. 2: Scatterplots of observed versus estimated air temperature ( o ) data for models C (a) and D (b).R 2 : determination coefficient, MAE: mean absolute error