Model for tomato photosynthetic rate based on neural network with genetic algorithm

: A photosynthetic rate model provides a theoretical basis for fine-grained control of light, and has become the key component to determine the effectiveness of light-controlled environments. Therefore, it is critical to identify an intelligent algorithm that can be used to build an efficient and precise photosynthetic rate model. Depending on the initial weights of a BP (Back Propagation) neural network algorithm for arbitrary random numbers, the establishment of a regressive prediction model can be easily trapped in a partially-flat area. Existing photosynthetic rate models based on neural networks are facing problems such as a slow convergence speed and a long training time, and this study presents a photosynthetic rate model of a heuristic neural network for tomatoes based on a genetic algorithm to address the above problems. The performance of the model can be effectively improved using a genetic algorithm to optimize the initial weights. A multi-factor nesting experiment was firstly conducted to obtain 825 groups of tomato seedling photosynthesis rate test data in the foundation, and the photosynthetic rate model of the heuristic neural network for the tomato is established through BP network structure construction and data preprocessing. The genetic algorithm was used to optimize the network weights and threshold, and the LM (Levenberg-Marquardt) training method for network training. On this basis, the training performance and precision of the photosynthetic rate prediction models can be further compared with the genetic neural network model and the neural network model. The test results have shown that the training effects and accuracy of the genetic neural network prediction model of the photosynthetic rate were better than those of the neural network prediction model. The correlation coefficient between the model predicted data and the measured data is 0.987, and the absolute error of the photosynthetic rate is less than ±0.5 μ mol/(m 2 ·s).


Introduction 
Light is an indispensable factor in the process of growing plants [1,2] .Due to various influences such as covering materials, dust, the sun altitude angle, the sun dip angle and structural shading, lighting capacity from autumn to early spring fails to meet crop growth requirements.This causes some problems such as slow crop growth, leaf shedding, less flower budding, abnormalities in flower color and shape, and low fruit production rates [3][4][5][6] .Artificial light is an important method of environment control and has become a hot topic in recent research.However, the strength of existing lighting technology is not sufficient for microclimate conditions for influencing photosynthesis, and there may be excessive or insufficient fill-in light [7,8] .Thus, a method for modeling the photosynthetic rate should be studied under the fusion of multiple factors, and a model that includes multiple associated environmental factors should be established.On this basis, fine-grained control of lighting environments has become a fundamental requirement in determining the performance of light environment regulation systems.
A variety of photosynthetic rate models have been established, including a rectangular hyperbolic model, a non-orthogonal rectangular hyperbolic model and an exponential relation model [9,10] .On the basis of the above-mentioned study, the photosynthetic rate electron transport models [11] , steady-state models of photosynthetic rate [12] and photosynthesis models under different nitrogen [13] have been proposed.These studies provide a good theoretical basis for modeling the photosynthetic rate, but it is not easy to obtain the model input variables for the physical parameters of the model during the routine test and production process, and it is also difficult to directly apply the outputs to regulate the light environment of crops.
In recent years, researchers have studied a series of photosynthetic rate models using environmental parameters as variables [14] , and a number of models have been proposed including a tomato photosynthesis rate influenced-model, a photosynthetic rate for chi-square model [15] and a nonlinear dynamic simulation model for tomato growth [16] .These studies have adopted a multiple regression and linear fitting method for the photosynthetic rate, and considered the correlation between different environmental factors to improve the adaptability and accuracy of the model.However, they still have limitations, such as a low fitting level and a complex fitting formula, which cannot be applied to the fusion of multiple environmental factors of a photosynthetic rate model.
Artificial neural networks have been widely used as a new intelligent modeling method for various types of multidimensional modeling of complex systems [17,18] .Recently, related research has been presented on photosynthetic rate models.A simulation model for photosynthesis in a greenhouse has been constructed based on a neural network model [19] .A prediction model for a net photosynthetic rate of tomato leaf during flowering stage has been established using BP neural network.These models effectively improve the fit of model and explore photosynthetic rate modeling method using artificial neural networks [20] .However, when a BP neural network algorithm is used to solve multiple sample questions, some problems arise such as a slow convergence rate and a long training period.Therefore, it is vital to study photosynthetic rates based on intelligent algorithms for modeling large data samples from multiple factors.The genetic neural network has recently become a new hot spot for solving the problem of large data samples from multiple factors [21] , and the use of a genetic algorithm is a global phenomenon.The global and parallel optimization advantages of a genetic algorithm can be used to optimize the learning process of a BP neural network, and overcome drawbacks such as a low convergence speed or the possibility of falling into a partial minimum value during the neural network training process [22] .The algorithm has already been widely used with the load forecasting model of natural gas using complex nonlinear large data samples [23] , and was shown to effectively improve the related coefficients and the convergence rate.All the above studies have shown that neural network with genetic algorithm can be adapted for the diversity of large sample sizes, which provides a good theoretical basis for the photosynthetic rate studied in this study.
Influenced by the problem outlined above, this study explored a fitting method for multiple-factor large sample data sets based on the analysis of the internal and external factors that affect photosynthesis.A type of universal significance for multiple factors was then proposed, based on a genetic neural network modeling method coupled with crop photosynthesis.A tomato photosynthesis rate based on the genetic neural model was established.The differences between this model and neural network modeling were compared and analyzed in terms of convergence speed and final results.The relevance and error analysis of the model were also analyzed to verify the accuracy of the modeling method, which provides a theoretical foundation for optimal control of a lighting environment.

Experimental materials
A field experiment was undertaken in a greenhouse in the Northwest A & F University, Yangling, China.The tomato seedlings that tested was the variety of "Burr 802" which were in a 72-hole tray seedling nursery substrate for an agriculture-specific matrix that full of tomatoes after hot water treatment of the seeds.The nutritional content of the tomatoes is as follows: the mass fraction of organic matter was 50%, the mass fraction of humic acids was 20% and the pH was 5.5~6.5.During fostering of the seedlings, sufficient fertilizer levels were maintained.When the tomato seedlings had 5 leaves with 1 leaf center, they were selected for uniform seedling colonization with joined fertilizer in mellow soil, which had organic matter quality scores of 16.83 g/kg, alkali solution nitrogen quality scores of 121.38 mg/kg, available p quality scores of 145.15 mg/kg, available potassium quality scores of 121.61 mg/kg and a pH value of 6.5.The sustained seedlings were then ready for the seedling test.The trial period, fertilizer and water management were conducted as usual, and no pesticides or hormones were sprayed.

Analysis of experimental parameters
Photosynthetic rate is significantly influenced by these above factors, resulting in changes to a plant's photosynthetic capacity as the microclimate of the facility changes.Light is one of the factors that directly affect the photosynthetic rate, as it provides energy for photosynthesis.Net photosynthesis will occur only when there is some light compensation.The photosynthesis rate increases as the light intensity increases.However, excess light can cause the occurrence of photo inhibition, resulting in a decline in the photosynthetic rate and crop damage [24] .
Other microclimate factors, such as temperature, can have a direct or an indirect influence on photosynthesis and have been shown to influence the activity of a crop's Rubisco activase, stomatal conductance and chlorophyll content [25,26] .The CO 2 concentration can also affect the crop, including the dark reaction rate and dry matter accumulation [27] .The water content provides water that needed for photosynthesis, and when there is seriously inadequate water content, the coercion phenomenon occurs.With mild water content coercion, there is not an obvious drop in the photosynthesis rate [28] .The fertility also indirectly affects photosynthesis, as N and Mg are components of chlorophyll, and mineral shortages have a certain effect on photosynthesis [29] .
Under different temperatures, the CO 2 concentration and the soil water conditions are different, resulting in a different crop photosynthetic rate.Therefore, crop photosynthesis and microclimate factors needs to be explored in more detail using multiple environment factors in models to adjust light parameters.
There is only weak variability in the fertility and moisture factors that affect photosynthesis with smaller dynamic changes, and this variability can be generally avoided by using water and fertilizer for daily cultivation management [30] .The three factors of light, carbon dioxide and temperature are affected by external conditions and other causes including internal shade, ventilation, and equipment operation such as a wet curtain.Variability in these factors can result in strong dynamic changes, and these changes will lead to significant photosynthetic rate changes [31] .Therefore, this study sets the environmental variables, including the photon flux density, the air temperature and the CO 2 concentration environmental factors.

Experimental methods
Randomly selected 75 robust tomato seedlings were used as test samples after 7 days of transplant seedling.In order to avoid influence of midday depression of photosynthesis, the tests were performed at 9:00-11:30 and 14:00-17:30.
Leaf net photosynthetic rate (Pn) were assayed using a LI -6400xt portable photosynthesis system (LI-COR, USA) , which can regulate micro-environment parameters such as dioxide concentration, temperature and photon flux density under different modules.The following environmental variables were set: the carbon dioxide concentration gradient was set at 300, 600, 900, 1200, and 1500 μmol/mol; the temperature gradient was set at 16°C, 20°C, 24°C, 28°C, and 32°C; and the photon flux density gradient was set at 0, 20, 50, 100, 200, 300, 500, 700, 1000, 1200, and 1500 μmol/(m 2 • s). 25 sets of tests were carried out at different combination of temperature and carbon dioxide concentration conditions.In each set, 3 seedlings were randomly selected to obtain net photosynthetic rate under 11 different gradients of photon flux density.Thus, a total of 825 sets of data were obtained from experiments for the model building and validation.

Model building
Experimental sample set and BP network structure were established during pretreatment process.The network weights and threshold optimization were then portrayed based on the genetic algorithm.Finally, the photosynthetic rate model was completed using genetic neural network, which is based on the network training of the LM training method.The overall process is shown in Figure 1.The coarse error analysis and filtering were conducted using the Dixon criterion, and each experiment output a value for P n of each sample, with parameters such as photon flux density, CO 2 concentration, temperature and air as the inputs X=(x 1 , x 2 , x 3 ) T .The multiple-factor large sample data set used for the input and output was managed using normalization processing, to create a data sample set for photos ynthetic rate modeling of normalized multi-factor coupling (X′, P′ n ).The optimum number of hidden layer nodes was simultaneously determined using a trial and error method, and it was discovered that the network performance meets design requirements when there are eleven hidden layers.Thus, a single hidden layer network was designed as 3-11-1 in order to establish the network structure of the photosynthetic rate model.The S tangent function Tansig was adopted for the neuron transfer function of the network hidden layer; and the linear function Purelin was adopted for the transfer function of the output layer.

Weight optimization based on genetic algorithm
In this step, the weight and threshold matrix was optimized based on the genetic algorithm used in the modeling method, using the network structure described above.This algorithm firstly adopted a binary coding scheme to complete the encoding of network weight.The fitness function was then established with the minimum error function, as shown in Equation ( 1 Based on a randomly-generated initial population, the matching value was calculated using the fitness function shown in Equation ( 1) in order to complete the population evaluation.When the population evaluation was not satisfied by the stop condition, the operation in the following paragraph was triggered.
Firstly, in the selection process, individuals with the greatest fitness were automatically upgraded to the next generation.The probability was calculated by the fitness, and the selection of the best individuals in the old population was achieved by selecting the probability of P s , so as to constitute a new population.The probability of being selected of the individual i is P s (i), as shown in Equation (2): where, F i is the fitness value of the individual i; N is the number of individuals in the population; The individuals in the new population obtained by selection were then paired using the crossover operation.According to the crossover probability, partial chromosomes were exchanged to form the new individuals.The crossover operation for the j th gene of the k th chromosome a k and the i th chromosome a i is shown in Equation ( 3): (1 ) (1 ) where, b is a random number in the interval [0, 1].When the generated random number conforms to b≤P c , the operation is conducted according to Equation (3), where, P c is the crossover probability that is set as 0.7.
Finally, one of the new individuals was randomly selected to conduct a mutation operation, to generate a better individual.The mutation operation for the j th gene of the i th individual a ij is shown in Equation ( 4): where, a max is the upper bound of a ij ; a min is the lower bound of a ij ; g is the number of iterations; G max is the maximum number of evolutions; r is a random number in the interval [0, 1].
When the generated random number conforms to M rP  , the operation is conducted according to Equation ( 4), where P M is the probability that is set as 0.01.After the mutation operation, fitness evaluation and genetic iteration were conducted on the weight vector v and weight vector w until the network initial weight optimization was completed, in order to generate the optimal weight vectors v and w.Weight vector v is the vector of the corresponding input layer to the hidden layer of each individual in the new population, and the weight vector w is the vector of the hidden layer to the input layer.

Establishment of the neural network model
At the start of the neural network training, the initial weights of the BP neural network were assigned using the optimal weight vector v and the optimal weight vector w.The components were output from the input layer and the hidden layer based on the input of a set of processed samples.The procedure described in the following paragraph was then triggered, and network training was performed.
The error signals of the output layer and the hidden layer were firstly calculated based on the true value and the network output value of the photosynthetic rate of the samples, using Equations ( 5) and ( 6) under single-output conditions: )   yo j j j j w y y   (6)   where, o  is the error signal of the output layer; y j  is the error signal of the j th neuron of the hidden layer; w j is the weight of the j th neuron of the hidden layer to the output layer; y j is the output value of the j th neuron of the hidden layer.
Based on the error signal and the input signal of each layer, the weight of each layer was then adjusted using Equations ( 7) and ( 8) by the LM training method: where, v ij is the weight of the i th neuron of the input layer to the j th neuron of the hidden layer; x i is the i th neuron of the input layer; η is the learning rate; e is the output error under single-output conditions; μ is the ratio coefficient, which is a constant that is greater than 0; I is a unit matrix; J is the Jacobian matrix of the output error e, which is shown in Equation ( 9): e e e J w w w The above process is repeated again to complete the rotating training for all samples in order to obtain the correction of the weight matrices for all samples.Based on this result, the network training error E RME was calculated using Equation (10) after the rotating training was completed:

Results and analysis
A model was designed to verify the validity of the genetic neural network algorithm modeling.This study used the same training set and validation set to establish a prediction model for the photosynthetic rate of the tomatoes using the neural network and the genetic neural network.By comparing the results of the model training performance and the model validation, the modeling results were analyzed and compared.

Results and analysis of model training performance
Considering the randomness of the neural network training results, the model was built by adopting a multiple training method in order to select the training parameters with the optimal convergence effect for modeling.Finally, a prediction model for the photosynthetic rate of the tomatoes was established based on the neural network and the genetic neural network.The genetic neural network model used the system error as the fitness function, and the initial weight and threshold matrix were optimized through the genetic algorithm.The evolution curve is shown in Figure 2.
Figure 2 Evolution process of the error Figure 2 indicates that the fitness value of an evolutionary individual is small at the initial stages of evolution.After selection, crossover and mutation treatment, the fitness value of an individual in the population constantly increases while the photosynthetic rate prediction error obtained based on the weight matrix gradually reduces.As the evolutional generations increase and each new individual generated by the genetic algorithm is approximated to the most targeted neighborhood, its fitness value will remain constant to complete the optimization for the initial weight and the threshold.Since the algorithm has no oscillation during its evolution process, it is consistent with the genetic algorithm characteristic that the optimal individual's fitness value tends to be stable.This indicates that the parameter setting of the genetic algorithm is reasonable and has good convergence.Thus, it can be used for optimization of similar multiple factor nonlinear problems.
On this basis, an optimized initial weight matrix was adopted based on the genetic algorithm and a randomly generated weight matrix to conduct the training for the prediction model of the neural network.The neural network curve based on genetic optimization is shown in Figure 3, while the training curve of the prediction model based on the BP neural network is shown in Figure 4. From Figures 3 and 4, it can be seen that there is a clearly improved training effect using the genetic optimization model, and the convergence of the model was also clearly improved.As shown in Figure 3, the network training of the neural network algorithm of genetic optimization achieved the goal of optimal training in only 17 steps.Furthermore, the number of training steps required is often less than 20 steps.In Figure 4, it can be seen that the mean square error of the training from the 20th step is 0.0031013 in the model training process, and the maximum training limit of 0.00005 and 50 steps is not reached during the model setting training target error training.However, from this point, the six-time error curve no longer decreases and the neural network has reached its minimum error and should stop training.It can be seen that the LM method is likely to converge to a local optimal solution after the initial weight threshold is stochastically determined.But for functional model fitting with a large local flat region, a local flat region may not be created from the model.High-precision fitting can not be achieved, and the prediction accuracy and generalization ability are affected.Therefore, it is obvious that the traditional anti-propagation neural network is built for multi-dimension models with local flat areas, and the model will become trapped in these local flat areas due to the problem randomly produced by the weight matrix of the neural network.Therefore, by the improvement of the partial network using the training method, the training goal can be achieved in some cases.For other cases that the training goal can't be achieved, the network convergence speed and training time are significantly higher than neural network without genetic optimization.Therefore, the photosynthetic rate model of tomatoes built by adopting a neural network of genetic optimization effectively improves the performance of the model training, which is applicable to these types of problems.

Model validation
In order to verify the prediction accuracy and generalization ability of the model, a verification test was conducted using different verification criteria.In this experiment, 60 groups of samples were used as the training set, account for 5% of the experimental samples used for the multiple factors test set.The photosynthetic rate models of the genetic neural network and the neural network were both verified to obtain a correlation analysis between the measured value and the predicted value of the two models, which is shown in Figures 5a and 5b   From Figure 5, it can be seen in Figure 5a that the determination coefficient of the correlation analysis between the measured value and predicted value of the optimized neural network model of photosynthetic rate is 0.989, and the straight slope is 0.989 with an intercept of -0.0131.In Figure 5b, the determination coefficient of the correlation analysis between the measured value and the predicted value of the non-optimized neural network is 0.9067, and the straight slope is 0.910 with an intercept of 0.874.This result indicated that although both models achieve the target error of 0.0001, the optimized neural network model of the photosynthetic rate had significantly higher degree of linearity, and it also has a better fitting degree.Through further calculation of the model error, it can be found that the prediction model error of the non-optimized neural network is generally between 0.3-1.1 μmol/(m 2 • s), while the prediction model error of the optimized neural network that is optimized by the genetic algorithm is generally between 0.1-0.4μmol/(m 2 • s).This indicates that the genetic neural network model established in this study has high prediction accuracy, can achieve precise prediction of the photosynthetic rate under conditions of different temperature, light and CO 2 .

Conclusions
In this study, the concept of a model for the photosynthetic rate that is established through adopting a genetic neural network has been proposed, based on the analysis of the characteristics and issues of the neural network algorithm and the genetic algorithm.A model for the photosynthetic rate of tomatoes based on a genetic neural network has been established using a multiple-factor nesting experiment to obtain the data sample set.It has been found that no oscillation and rapid convergence was discovered during the overall model building process.The optimal individual's fitness value tends to be stable while the evolutionary process is occurring, which indicates that the genetic algorithm designed in this study can be used for optimization of the weight matrix with good convergence.Neural networks have been compared with a randomly generated weight matrix.The neural network built on this basis has shown a clear improvement in handling local flat areas and a better convergence property, which can achieve the training goal within 17 steps.Repeated oscillation has not been seen across the overall process and the process converges quickly.The determination coefficient of the correlation analysis between the measured value and the predicted value of the neural network model is 0.989, and the straight slope is 0.989 with an intercept of -0.0131.The error is generally between 0.1-0.4μmol/(m 2 • s), which is significantly better than that of a non-optimized neural network model.This indicates that the genetic neural network model has a higher prediction accuracy, and it is an important method that can be useful to solve this type of photosynthetic rate modeling, achieving precise prediction of the photosynthetic rate under different temperatures, light and CO 2 conditions.This model can provide theoretical support for light environmental control of tomatoes.

Figure 1
Figure 1 Flow chart of the construction of the photosynthetic rate model based on a Genetic Neural Network 2.4.1 Modeling pretreatmentThe coarse error analysis and filtering were conducted using the Dixon criterion, and each experiment output a value for P n of The final step was to judge the error.If the training error was still greater than an error threshold, a new round of iteration was conducted until the error is below the threshold or a defined number of training steps is achieved.The training will then be declared complete and the network is preserved.Anti-normalization was then conducted for the photosynthetic rate model, as shown in Equation (11):

Figure 3 Figure 4
Figure 3 Genetic optimization neural network model training curve .
a. Photosynthetic rate model of the genetic neural network b.Neural network model of the photosynthetic rate

Figure 5
Figure 5 Correlation analysis between the simulation values and the measured values of the photosynthetic rate