Irrigation decision model for tomato seedlings based on optimal photosynthetic rate

Soil moisture is a major environmental factor that influences tomato growth and development. Suitable soil moisture not only increases tomato production but also saves irrigation water. In this study, an irrigation decision model was developed, which called soil moisture regulation model, for optimizing growth of tomato seedlings while saving water. The data used for modeling were collected from a multi-gradient nested experiment, in which temperature, photosynthetic photon flux density (PPFD), carbon dioxide (CO2) concentration and soil moisture were variables and the corresponding photosynthetic rate was measured. Subsequently, a prediction model of tomato photosynthetic rate was constructed using support vector regression (SVR) algorithm. With photosynthetic rate prediction model as fitness function, genetic algorithm (GA) was used to find the optimal soil moisture under each combination of the above environmental factors. Finally, back propagation neural network (BPNN) algorithm was used to establish a decision model of tomato irrigation, which could provide the optimal soil moisture under current environment. For the soil moisture regulation model constructed here, the coefficient of determination was 0.9738, the mean square error of the test set was 1.51×10, the slope of the verified straight line was 0.9752, and the intercept was 0.00916. This model demonstrated high precision, which thereby provides a theoretical basis for accurate irrigation control in the greenhouse facility environment.


Introduction 
The reasonable use of agricultural water resources could improve the efficiency of water resource utilization, alleviate water shortages, and promote agricultural development in Northwest China [1,2] . Tomato is a typical greenhouse crop. The water tomato absorbed is mainly from soil, which determines its growth and yield [3][4][5] . Drought stress could lead to crops' growth decrease, resulting from the damage of metabolic processes and photosynthetic apparatus. Ors et al. [6] found out that drought condition could result in permanent damage to the plant including disruption of stem and root development, as well as decrease in number and width of leaves. When irrigation is scarce, the chemical environment of the crop roots changes, thereby affecting the photosynthetic rate of the plant [7] . However, more irrigation is not always better for plants. An excessive water supply weakens the active oxygen metabolism of the crop, which affects photosynthetic rate, plant developmental stages and crop production [8] . Liu et al. [9] investigated the suitable drip irrigation scheduling for tomato grown in solar greenhouse, and found that plant-pan coefficients K cp3 0.9 and K cp4 1.1 had no significant difference in yield, suggesting that excessive irrigation water cannot increase tomato yield significantly. The water demand of tomatoes is also highly related to various environmental factors such as light intensity, carbon dioxide (CO 2 ) concentration, and temperature [10,11] . Therefore, it is necessary to adjust the irrigation volume considering the water demand of plants under different environmental conditions.
The intelligent control of the soil water environment is a field of great research interest. Mohapatra et al. [12] predicted hourly soil moisture content requirement as well as required soil evapotranspiration using the Blaney-Criddle method based on radial basis function neural network, and developed a fuzzy logic based weather dependent irrigation control mechanism. There were also researches on irrigation control combined with crop growth models. Choi et al. [13] developed a tomato transpiration model considering the specific environment in the greenhouse, which could be used for precision irrigation and environment control in greenhouse tomato cultivation. Soundharajan et al. [14] proposed a simulation-optimization framework, which utilized a rice crop growth simulation model to identify the critical periods of growth, and the optimal water allocations were developed using genetic algorithm (GA) based optimizer during the crop growing period. Taking into account the impact of the weather, Gowing et al. [15] presented an approach to predicting short-term supplemental irrigation schedules for potatoes using short-term weather forecasts for optimal irrigation decisions. Rowshon et al. [16] developed a Climate-Smart Decision-Support System for modeling water demand of rice irrigation schemes under climate change impacts. Photosynthesis is the basis of dry matter production in plants. Photosynthetic rate is an important parameter characterizing the photosynthetic capacity of the photosynthetic apparatus [17] . In addition, there were close relationships between yield and net photosynthetic rate [18] . Previous studies to determine the target soil moisture or soil moisture requirement have not considered the effect of photosynthetic rate on these calculations.
This study considered the effects of temperature, photosynthetic photon flux density (PPFD), CO 2 concentration and soil moisture on photosynthesis and proposed a photosynthetic rate prediction model to obtain the photosynthetic rate values of tomatoes under different environmental conditions. On this basis, a soil moisture optimization method was proposed to calculate the optimal soil moisture value for tomato growth. Then, a tomato irrigation decision model was constructed, which dynamically obtained the optimal soil moisture value under different environmental conditions. This irrigation decision model provides a theoretical approach for optimizing the growth of tomato seedlings as well as maximizing the irrigation efficiency.

Experimental materials
The experiment was carried out in the Key Laboratory of Agricultural Internet of Things of Ministry of Agriculture and Rural Affairs, Northwest A&F University, Xianyang City, Shaanxi Province. Tomato seedlings (Zhongyan TV1) were cultured on seedling substrate (Pindstrup Substrate, Denmark). When the tomato seedlings had grown 3-4 true leaves, healthy seedlings with similar shape were transplanted into 10 cm×10 cm×10 cm square planting boxes and cultivated in a climate chamber (RGL-P500D-CO 2 , Darth Carter, China), as shown in Figure 1. In the climate chamber, the temperature of day and night was set to 28°C and 20°C, the relative humidity of the air was set to 50%, and the CO 2 concentration was set to 400 µmol/mol. Five different soil water contents were prepared in the planting boxes by 3 d of irrigation at different rates (0, 25, 50, 75, and 100 mL) at the end of the recovery stage. During this period, the irrigation rate was changed dynamically to maintain the average of current soil volumetric moisture content within ±1% deviation from the set value (5%, 10%, 20%, 30%, 40%). No pesticides or hormones were sprayed during the cultivation period.

Experimental methods
To generate modeling data for the tomato seedling photosynthetic rate prediction model, a multi-environmental factor nested experiment was designed. The photosynthetic rate data was acquired at 9:00-11:30 and 14:30-17:30. After the seedlings had grown 5-6 true leaves, three tomato seedlings were randomly selected from each irrigation treatment group.
The third functional leaves from the top were selected for measurement of photosynthetic rate. The photosynthetic rates of the three tomato seedlings under each treatment were averaged to record.
Environmental changes required for the experiment were provided by different environmental control modules of an LI-6800 Portable Photosynthesis System (LI-COR Biosciences, USA). The LI-6800 sensor head was used to clamp the leaf with an area of 2 cm 2 to measure the gas exchange volumes. According to the optimum temperature range for tomato seedlings growth [19] , the temperature control module provided four temperature settings (18°C, 23°C, 28°C, and 33°C); according to the suitable light intensities for the culture of young tomato plants [20] , the light emitting diode light source module provided six PPFD gradients (0, 50,100, 500, 600, and 800 µmol/m 2 · s); considering the light energy utilization and light saturation point of tomato under different CO 2 concentration [21] , the CO 2 selection injection module provided three CO 2 concentrations (400, 700, and 1000 µmol/mol); and the irrigation control provided five soil volumetric moisture contents (5%, 10%, 20%, 30%, 40%).
In order to ensure that soil moisture content is maintained at the set value, it is necessary to monitor soil moisture in real time and irrigate timely. In this experiment, an automatic irrigation device was designed. The data from soil moisture sensor (EC-5, METER Environment, USA) was returned to the single-chip microcomputer every second through the wireless sensor network, and compared with the soil moisture set in the experiment. When the difference between the actual soil moisture and the set soil moisture was more than 1%, the single-chip microcomputer started the water pump by controlling the relay. When the actual soil moisture was equal to the set soil moisture, the pump was turned off to stop irrigation. The soil moisture contents were taken at 2.5 cm around the rhizomes of the seedlings, and the average values of the three measurements were recorded. The environmental equipment is shown in Figure 2.
With temperature, PPFD, CO 2 concentration, and soil moisture as independent variables, net photosynthetic rate as dependent variable, 360 sets of tomato seedling data were obtained in the above experiment.
a. LI-6800 portable photosynthesis system b. Automatic irrigation device Figure 2 Environmental equipment of photosynthetic rate and soil moisture

Pearson correlation
To avoid the impact of magnitude differences on the training results of the models, the mapminmax function was used to normalize the different dimension data to the [-1, 1] interval, including temperature, CO 2 concentration, PPFD, soil moisture, and photosynthetic rate. The normalization formula is: z′ = 2(zz min )/(z maxz min ) -1 (1) where, z is the data to be normalized; z min and z max are the minimum and maximum values in the data set to be normalized.
Then, in order to analyze the correlation between various environmental factors and photosynthetic rate, Pearson correlation analysis was applied. The Pearson correlation is a statistic method used to reflect the degree of linear correlation between two variables [22] , and the formulas are given by Equations (2) to (5).
where, X j (j=1,2,3,4) represent temperature, CO 2 , PPFD and soil moisture respectively; n is the number of samples; S Xj is the standard deviation of the j th environmental variable; Y is photosynthetic rate; S Y is the standard deviation of Y; S XjY is the covariance of X j and Y; r is Pearson correlation coefficient.
The pearsonr function of the scipy.stats package in Python 3.6 was used to calculate the Pearson correlation coefficient between the photosynthetic rate and each factor.

Photosynthetic rate prediction model
First, a tomato photosynthetic rate prediction model was built based on the support vector regression (SVR) algorithm, which was used to predict the photosynthetic rate of tomatoes grown in different environments. Support vector machine (SVM) is a typical kernel machine learning method, which minimizes the boundary between empirical risk and Vapnik-Chervonenkis (VC) dimension, without compromising the accuracy of the data approximation and the complexity of the approximation function to get good classification and promotion ability [23] . The SVM for regression (SVR) has been widely used in various modeling studies because of its unique performance in solving small sample sets, non-linear and high-dimensional regression problems.
The normalized data sets were randomly divided into the training set and the testing set, with the ratio of 7:3. In the SVR algorithm, the penalty factor c and kernel function parameter g are important parameters that affect the model performance [24] . To obtain high prediction accuracy, the optimal algorithm parameters were determined using the grid search method. The peak model accuracy was obtained when the SVR parameter c was 2 and g was 0.5.
For a given data set (x i , y i ), x i ∈R N , y i ∈R, i =1,…n, the radial basis function (RBF) kernel was used. The nonlinear inseparable problems in the low-dimensional space were mapped to the high-dimensional space, and a hyperplane for optimal classification was generated in the high-dimensional space for linear regression decision analysis. Finally, a nonlinear regression function was obtained: where f(x) is the output of the decision function, k(x i , x) is the kernel function, α i * and α i are the Lagrange multipliers, and b is the offset value. In order to verify the reliability of SVR model, using the same data set, photosynthetic rate prediction models were constructed by the partial least squares regression (PLSR) algorithm and back propagation neural network (BPNN) algorithm. Their coefficient of determination (R 2 ) and root mean square error (RMSE) were compared and analyzed.

Soil moisture optimization method
Based on the tomato photosynthesis rate prediction model, the tomato photosynthesis rate was used as the fitness function, and a soil moisture optimization method was proposed based on genetic algorithm (GA). Therefore, the optimal soil moisture values under each combination of environmental conditions were obtained.
The core idea of the GA is to combine the survival rules of the fittest in biological evolution with the random information exchange mechanism of the chromosomes within the population [25] . Because of its advantages such as robustness and suitability for parallel processing, GA has attracted attention in many areas such as function optimization, machine learning, and data processing [26] .
The main parameters that affect GA performance are the population size popsize, the crossover probability pc, and the mutation probability pm. The grid search was adopted to choose the parameter combination with the lowest mean square error (MSE). pc, pm, and popsize were determined to be 0.9, 0.1, and 60, respectively.
The algorithm flow was as follows: 1) Initialization. The tomato photosynthetic rate model was introduced as a fitness function; the GA parameters popsize, pc and pm and the termination evolution criterion were set; the initial population was created based on the soil moisture range; the evolution algebra counter was reset. Then, a set of environmental combinations of temperature, CO 2 , and PPFD was extracted.
2) Individual evaluation. According to the current environmental combination, the fitness, in terms of the corresponding photosynthetic rate value, of each individual in the population was calculated.
3) Population evolution. The population was selected, crossed, and mutated to form a new generation of population X(t+1). This operation was continued, recorded and the most adaptable individual was updated until the termination criterion was met.
4) Results output. The individual (soil moisture) with the highest fitness in the group of current environments was output as the optimal solution. A new group of environmental conditions was extracted and the above operations were repeated until all optimizations were completed.
Because the photosynthetic rate in the low-light interval of [0, 100] is generally low, the difference in regulation effect is small. Therefore, based on the above process, the temperature was adjusted within the interval [18,33] with a step of 1°C, the CO 2 concentration was adjusted within the interval [400, 1000] with a step of 50 µmol/mol, and the PPFD was adjusted within the interval [100, 800] with a step of 50 µmol/m 2 ·s . The optimal soil moisture content corresponding to the maximum photosynthetic value under the combination of temperature, CO 2 concentration and PPFD in the 3120 groups was searched and assessed for irrigation regulation.

Soil moisture regulation model
In order to dynamically predict the optimal soil moisture and reduce calculation time, a soil moisture regulation model was established based on the proposed methods. The relationship between these methods is shown in Figure 3.

Figure 3 Relationship between the models
The temperature, light intensity and CO 2 concentration of the greenhouse were used as inputs to construct the soil moisture regulation model using the BPNN algorithm. The model could output the optimal soil moisture based on the current greenhouse environment and could be easily transplanted into the embedded terminal. The application of the soil moisture regulation model is shown in Figure 4. Among the neural network algorithms, the BPNN is a feed-forward neural network with three or more layers, and is trained according to the error back propagation algorithm. The input signal is transmitted between the neurons in each layer, and through the activation function of each neuron, forward propagation is completed. Based on the forward-propagating error information, the BPNN uses the gradient descent method to modify the network's multilayer connection weights and thresholds layer by layer from back to front, until it reaches the termination condition. Its main applications are in information analysis, image processing, and data optimization [27,28] .
Then, the optimized data sets were randomly divided into the training set and testing set, with the ratio of 8:2. The parameters of the BPNN were set as follows: three neurons for the input layer, two hidden layers, with eight and six neurons, respectively, and one neuron for the output layer; the learning rate was 0.02, and the maximum training steps was 2000.
The BackpropTrainer function of the pybrain.supervised.trainers package was used to train the BPNN.
In order to verify the reliability of BPNN model, using the same dataset, soil moisture regulation models were constructed by the PLSR algorithm, SVR algorithm and random forest (RF) algorithm. Their R 2 and MSE were compared and analyzed.

Experiment results
The correlation coefficients between photosynthetic rate and various environmental factors are shown in Table 1. In this experiment, photosynthesis rate was positively associated with PPFD, CO 2 concentration, temperature and soil moisture, and had a significant correlation with PPFD, CO 2 concentration and soil moisture.
Based on the data obtained from the above experiment, the effects of light intensity and soil moisture on the photosynthetic rate of tomatoes were analyzed at a temperature of 28°C and CO 2 concentration of 400 μmol/mol; the light response curves under different soil moisture conditions were shown in Figure 5. When the PPFD was low (0-200 µmol/m 2 · s), light intensity was the main factor limiting the plant photosynthesis rate and soil moisture had little effect on plant photosynthesis. As the light intensity increased, the effect of soil moisture on the photosynthetic rate of plants gradually became apparent. Therefore, it can be concluded that under different light intensities, the water demand of plants changed greatly, resulting in different values of optimal soil moisture.
Meanwhile, under different treatments of soil water content, the light saturation points of the plant also changed, indicating that the influence of multiple environmental factors on the growth of tomatoes was coupled. Therefore, it was of practical significance to explore the soil moisture range suitable for tomato growth under multiple environmental factors.

Tomato photosynthetic rate prediction model validation results
To optimize the model results, the SVR algorithm and BPNN algorithm were used to construct the tomato photosynthetic rate prediction model. The newff() function was used to construct a single hidden layer of BPNN algorithm. After multiple attempts, the number of neurons in the hidden layer was set to 10, the number of training runs was set to 1000, the training target was set to 10 -3 , and the learning rate was set to 0.1. The PLSR algorithm was used for fitting. The R 2 and RMSE of the three models were compared (Table 2). For the SVR model, the training set R 2 was 0.9556, and the RMSE was 1.1277; the testing set R 2 was 0.9447, and the RMSE was 1.2911. All indicators performed better than the BPNN algorithm. Therefore, the photosynthetic rate prediction model based on the SVR algorithm was selected, and the test set was used for model verification. The linear slope was 0.9301 and the intercept was 0.2617. The verification result is shown in Figure 6. Figure 6 Fitting results of test set of photosynthetic rate prediction model

Soil moisture optimization method validation results
In order to verify the reliability of soil moisture optimization method based on GA, particle swarm optimization (PSO) algorithm was applied to soil moisture optimization method. According to researches on the PSO parameter settings [29,30] , parameters c 1 and c 2 are important for algorithm performance. Therefore, c 1 and c 2 were tested in the range of [0, 4], and a cross-over attempt was conducted. When c 1 and c 2 were both 2, the peak accuracy was achieved.
To compare the performance of the two algorithms, optimization methods were applied to find the optimal photosynthetic rate under different environmental conditions. The temperature was fixed at 18°C, the CO 2 concentrations were set to 400 and 700 µmol/mol, the PPFD was adjusted within the range of [400, 800] in steps of 100 µmol/m 2 ·s . The optimization results were shown in Table 3. Under the same set of optimization conditions, the optimal photosynthetic rate value found by the GA was always higher than that of the PSO algorithm, by an average of 0.07298 µmol/m 2 · s under ten different conditions. When compared with PSO, the GA showed better convergence of the optimization calculation and was more suitable for this optimization problem.
For the CO 2 concentration of 400 μmol/mol and temperature within the range of 18°C to 33°C, the partial GA optimization process was shown in Figure 7. From this, 3120 sets of optimal soil moisture data sets under the combination of temperature, CO 2 concentration, and PPFD were obtained as the basis for building the soil moisture regulation model.  The scatter plots of the optimization results were shown in Figure 8, which visualized the relationship between light intensity, CO 2 concentration and optimal soil moisture.
The CO 2 concentration range was [400, 1000] μmol/mol, the PPFD range was [100, 800] µmol/m 2 · s and the temperatures were 18°C, 23°C, 28°C, and 33°C. At 18°C, the optimal soil moisture first increased and then decreased with the increase in CO 2 concentration and light intensity, and the overall soil moisture was low. The optimal soil moisture at 23°C and 28°C showed similar trends, indicating that as the temperature increased, the utilization of light and CO 2 increased. However, the optimal soil moisture at 23°C was still mainly affected by two factors, light intensity and CO 2 concentration, which was reflected by the increase in the control point with increasing light intensity and the decrease with increasing CO 2 . When the temperature reached 28°C, CO 2 concentration was no longer the main influencing factor, and the control point mainly reflected the difference in ambient light intensity.
When the temperature was further increased to 33°C, plant growth was restricted, which led to a reduction in water consumption for various physiological reactions. This resulted in the soil moisture control point at 33°C being lower than that at 28°C.

Soil moisture regulation model validation results
The BPNN, SVR, RF, and PLSR algorithms were used to construct soil moisture regulation models. The grid search and cross-validation methods were used in all of these models for parameters determination, and the R 2 and the MSE were used to judge the model accuracy. The accuracy of each model is shown in Table 4. The modeling results of multivariate statistical analysis methods (such as PLSR) were poor, with R 2 of the training and test sets less than 0.6. Similarly, the accuracy of the model trained by the SVR algorithm was also low. The R 2 and MSE of the RF algorithm were better than that of SVR, but the predicted values of the training samples were discontinuous, indicating that the RF algorithm was not suitable for the construction of this model. When compared with the other algorithms, the model trained by the BPNN algorithm had a high determination coefficient and a small MSE.
Model verification was performed using 624 test sets ( Figure  9); the verification results generated a linear slope of 0.9752 and an intercept of 0.009162. The soil moisture regulation model trained based on the BPNN algorithm had the highest accuracy, therefore, the BPNN algorithm was the best choice for soil moisture regulation model construction. Figure 9 Fitting results of test set from soil moisture regulation model

Discussion
The analysis of Pearson correlation between photosynthetic rate and environments showed that PPFD, CO 2 concentration, and soil moisture all had a great influence on the photosynthetic rate (Table 1). This is likely because that light is an energy source of photosynthesis [31] , and both CO 2 and water are reactants of photosynthesis [32] . Additionally, temperature is an important factor influencing the photosynthesis of plants as well, but the temperature set in this thesis is suitable for tomato growth [33,34] , resulting in the correlation between temperature and photosynthetic rate being non-significant.
Temperature, light and CO 2 concentration could affect the stomatal movements and a series of enzyme-catalyzed reactions of plants [35,36] , which results in the impact of the environments on plant growth is coupled [37,38] . The optimal soil moisture changes with different light intensity, CO 2 concentration, as well as temperature (Figure 8).
When temperature was low, the activity of photosynthetic enzymes was inhibited [39] , which would decrease the demand for light and CO 2 . Then, the photoinhibition might occur, and the photosynthetic rate [40] and water required for reactions might reduce. Meanwhile, the excessive CO 2 might make stomata conductance be reduced [41] , and the water consumed by transpiration decreased [42] . On the contrary, when the temperature was around 23°C , the activity of photosynthetic enzymes was high [43] .
The performance of photosynthesis would be improved, and the demand for light and CO 2 increased. To this end, the water required for photosynthesis and transpiration would increase. Moreover, it is similar to the situation of low temperature that the activity of photosynthetic enzymes would decrease at high temperature [44] . The water requirement of tomato would decrease as well. As is well-known, the optimal soil moisture should be determined according to the water requirement of tomato. Intelligent regulation of soil moisture needs to account for these coupled environmental factors, which would improve tomato growth.
The aim of this research was to optimize photosynthetic rate of tomato in different environments by investigating the impact of soil moisture on photosynthesis. The optimal soil moisture in the soil moisture regulation model could be calculated dynamically with the change of multi-environment, and it could also save water. In terms of edge computing used in embedded terminals, machine learning has advantages over deep learning due to its small data requirements and short computing time. In this paper, machine learning was used to fit experimental data to build models. Among the aforementioned modeling methods, both the SVR algorithm and the BPNN algorithm were used to build the photosynthetic rate prediction model and the soil moisture regulation mode, but the difference between their performance was significant. It might be because that the SVR algorithm is more suitable for the fitting of small samples and the multi-layer BPNN algorithm performs well in the fitting of larger data sets. The model built on the basis of photosynthetic data is more in line with the growth characteristics of tomato. At the same time, machine learning could provide the possibility for combining photosynthetic data and irrigation models, which would achieve efficient and precise irrigation as these models could be embedded in intelligent irrigation equipment. Data-driven models might be the main direction of future irrigation models.
Since tomatoes of different varieties have similar growth characteristics [45] , their photosynthetic data change with the environment similarly. Machine learning might be an appropriate method for the regression of these data. The modeling method proposed could be further applied to provide optimal soil moisture for other varieties of tomatoes by using the same data acquisition method.

Conclusions
This paper conducted multi-gradient nested experiment of environmental factors to measure the photosynthetic rate of tomato seedlings under different PPFD, temperature, CO 2 concentration, and soil moisture conditions. A soil moisture regulation model was constructed to dynamically predict optimal soil moisture for tomato growth under different environmental conditions. This model was able to provide the suitable soil moisture environment for greenhouse tomatoes, reduce ineffective transpiration of plants, and improve water utilization, thereby providing a theoretical basis for precision irrigation in greenhouse facility.
The main conclusions were as follows: 1) The PLSR, BPNN, and SVR algorithms were used to construct the photosynthetic rate prediction model, and the test set R 2 of the tomato photosynthetic rate prediction model based on the SVR algorithm was 0.9447, exhibiting high accuracy. This model provided a method for the dynamic acquisition of tomato photosynthetic rate values in different environments.
2) Based on the tomato photosynthetic rate prediction model constructed in this study, the PSO and GA algorithm were used to find the maximum photosynthetic rate and then obtain the optimal soil moisture point. The comparison showed that the soil moisture model based on the GA had a higher accuracy than the PSO algorithm.
3) When compared with other algorithms tested, the soil moisture regulation model based on the BPNN algorithm had the highest accuracy, with a testing set R 2 of 0.9738, and MSE of 1.51×10 -5 . The results showed that the soil moisture regulation model was highly accurate and could output the optimal soil moisture point under different environmental conditions, providing a theoretical basis for greenhouse soil water environment regulation.