Modeling and simulation of temperature control system in plant factory using energy balance

Closed production systems, such as plant factories and vertical farms, have emerged to ensure a sustainable supply of fresh food, to cope with the increasing consumption of natural resource for the growing population. In a plant factory, a microclimate model is one of the direct control components of a whole system. In order to better realize the dynamic regulation for the microclimate model, energy-saving and consumption reduction, it is necessary to optimize the environmental parameters in the plant factory, and thereby to determine the influencing factors of atmosphere control systems. Therefore, this study aims to identify accurate microclimate models, and further to predict temperature change based on the experimental data, using the classification and regression trees (CART) algorithm. A random forest theory was used to represent the temperature control system. A mechanism model of the temperature control system was proposed to improve the performance of the plant factories. In terms of energy efficiency, the main influencing factors on temperature change in the plant factories were obtained, including the temperature and air volume flow of the temperature control device, as well as the internal relative humidity. The generalization error of the prediction model can reach 0.0907. The results demonstrated that the proposed model can present the quantitative relationship and prediction function. This study can provide a reference for the design of high-precision environmental control systems in plant factories.

1 Introduction  Plant factories, one type of closed system, are designed to maximize production density, productivity, and use the efficiency of natural resource to alleviate the local scarcity of urban food supply. This high-level agricultural production approach can efficiently adjust the internal microclimate to a more suitable state for plant growth, thereby improving the quality and yield of crops [1] . There are some incorporating features of plant factories, such as artificial lighting and active cooling, particularly on the excellent hermetic seal, tunable light intensity, forced circulation channels of internal air, and high CO 2 absorption rate [2] . Currently, the switching control system is commonly used to manipulate the artificial lighting, temperature, humidity, CO 2 concentration, and nutrient solution in most facilities of plant factories [3][4][5][6] . However, the input and output signals of various parameters always vary in constant impulse [7][8][9] , easily resulting in power loss and annual maintenance cost of the equipment [10] . Therefore, it is necessary to select a continuous variable control system for the environmental devices to obtain the desired values of outputs. Then, the main component for this continuous control system can be a microclimate environment model, generally related to temperature, humidity, CO 2 in plant factories. Establish a highly accurate microclimate model becomes a great challenge for the subsequent procedures in a plant factory, particularly on the precise control of crop growth, quantitative management, environmental atmosphere optimization, saving energy, and reducing resource consumption [11][12][13] .
Temperature is one of the important climate variables affecting crop growth. The physical laws of this continuous variable can be represented in a continuous dynamic mode [1,14] . The control system is designed, usually based on a logic rule algorithm, in most devices, such as artificial light sources, temperature control equipment. Thus, the specific procedure can be driven by discrete events, concurrently varying in the continuous dynamic changes [15,16] .
The interaction between discrete events and continuous dynamic changes can make the temperature a complex hybrid dynamic system [17,18] .
Most efforts have been made on the theories and methodologies for the prediction of a temperature control system in a plant factory over the last decades [19][20][21][22] . The conventional prediction methods were mostly based on the principle of sequential control systems, representing by autoregressive moving average model (ARMA), and non-stationary autoregressive moving average model (ARIMA), with emphasis on ARIMA model with a linear approximation system and disturbance characteristics [13,19,20,23] . These widely used methods can be attributed to the dominant features, such as the fewer input data to deal with, and simple models [13] . However, the high stability of the original time series has become highly demanded, while the probability distribution of white noise is also required to know ahead of time. The method based on a priori probability can cause some deviation when ordering in the time series during data processing. Specifically, the large prediction error can occur with non-normal distribution data and multi-collinearity.
This study aims to construct a model of internal temperature variation in a plant factory based on the energy balance theory [24,25] , and further to explore the input factors that influence the internal temperature, such as volume flow, supply air temperature, internal relative humidity. Using the classification and regression trees (CART) algorithm in random forest (RF) theory [26][27][28] , a model system with in-out measurements was established to predict the behavior of the temperature control system, with emphasis on economically feasible application.

Theoretical background for mechanism model of temperature control system
This section presents the theoretical models of energy balance in closed plant production, further to deduce the mechanism model of a temperature control system, particularly for energy loads balance, crop transpiration, and artificial lighting in plant factories. In a building structure, there are non-uniform spatial distributions of temperature in the horizontal and vertical directions [29,30] , partly due to the combined influences of many factors, such as the internal environment, control mode, planting structure, and types of crops [31,32] . Nevertheless, the internal temperature in the plant factory can be assumed as a uniform spatial distribution in a specific research period, and then to establish the relationship between temperature and time.

Energy balance in a conventional greenhouse
A typical light-transmitting greenhouse with natural ventilation consists of some structural materials (glass, plastic), surface soil, architectural components, heating and vents [33] . Taking the air and crops in the greenhouse as the research object, two forms of energy can be divided, including the sensible heat and latent heat. The sensible heat is the exchange of energy between the system and the surrounding, including the solar radiation energy (Q R ), conduction and convection passing through greenhouse materials (Q F ), the energy transferred by different architectural components and production systems (Q C ), soil heat transfer energy (Q Soil ), the energy transferred by natural ventilation (Q V1 ), the energy transferred by mechanical ventilation (Q V2 ), energy supply by the heating system (Q S ), heat energy from the artificial lighting system, and the energy consumed by photosynthesis. The latent heat is the transfer of energy to a system during a phase transition of matter, covering the energy transferred by crop transpiration (Q tran ), the heat absorption and release during water evaporation and condensation (Q H ). To simplify the model, the heat energy from artificial lighting, and energy consumed by photosynthesis can be neglected due to their relatively small values, compared with the the sensible heat.
Therefore, a simplified model can be established to illustrate the energy balance in a typical greenhouse, as shown in Figure 1. The energy balance equation can be represented by Equation (1).

Energy balance in a plant factory
An artificial lighting system is generally installed to provide the related energetic fluxes in a plant factory. The opaque thermal insulation materials can be used to build a highly insulated closed architecture, like a warehouse in shape, thereby minimizing the heat transfer from the structural wall to the outside. Inside a plant factory, forced air circulation is usually utilized to transfer all the energy passing through the internal system [34,35] . Therefore, the artificial lighting radiation energy Q rad can be selected in a plant factory, rather than the solar energy Q R , in a typical greenhouse, where Q diss is the thermal energy of artificial lighting. In the case of completely artificial production and opaque facades, the architecture can be considered as adiabatic, whereas the Q F , Q soil , Q C can also be neglected. When the temperature control device is used to monitor the heating and cooling, the energies Q V1 , Q V2 can become Q 0 (the energy flux flowing out during forced ventilation), and Q S (the energy flux transferred into the indoor via the treated air). The soil moisture evaporation can be omitted, due to the excellent sealing performance under the thin film covering soil surface, or nutrients cultivation pond.
The water vapor condensation can also be omitted with the relatively small value, compared with that of the total crop transpiration. Thus, a simplified model can be established to illustrate the energy balance of the plant factory, as shown in Figure 2. The energy balance equation of the plant factory can be represented by Equation (2). Generally, the required temperature and relative humidity can be gained for the internal air atmosphere in the closed production system of a plant factory. However, there are some inevitable disturbing factors, which can remarkably change the temperature and relative humidity indoors. These influencing parameters can be called load. According to the first law of thermodynamics, loads of heat and humidity in a plant factory can be expressed as [36] : where, m is the internal air mass, kg; m = ρ a · V; V is the indoor volume, m 3 ; ρ a is the indoor air density in kg/m 3 , and ρ a = 1.199 kg/m 3 for the air in a plant factory; c ap is the specific heat of air at constant pressure, and the c ap =1.009 kJ/(kg·°C); ∆t is the change rate of internal temperature, °C/s; t a is the indoor temperature, °C. The energy in a plant factory per unit time generally includes the heat transferred into the indoor via the treated air and the load energy from the indoor heat source. There is a great influence of exterior climate on the heating load in greenhouses, but in a plant factory, the heating load can be omitted, due to a much smaller effect than the internal loads caused by lighting and plants, particularly when the system is running smoothly, and no human activity disturbing. Thus, Equation (5) can be gained in this case. (5) where, Q in denotes the heating load caused by the indoor heat source.
In the typical plant factory, the energy flux transferred into the indoor via the treated air Q S can be written as where, k s is the capacity coefficient of the supply air, k s = 1.21 kJ/(m 3 ·°C); m s is the air flow quality of the supply air, kg/s; q s is the volume flow rate, m 3 /h; t s is the temperature of supply air, °C.
In a plant factory, the energy flux flowing out during forced ventilation can be written as Q 0 = m· c ap · (t a − t r ) (7) where, t r is the temperature of returning air, °C . The rest of the parameters are the same as mentioned above. Because t r = t a in a plant factory, then Q 0 =0 kJ, finally, the governing equation can be obtained,

Energy consumption of crop transpiration
The crop transpiration directly determines the internal energy balance in a closed system. In practice, the crop transpiration coefficient can be defined as the fraction of the radiation load that dissipated by the crop as the latent heat. An accurate estimate of the crop transpiration coefficient becomes necessary in recent years. The Penman-Monteith crop transpiration method [37] can be used to evaluate the energy produced by crop transpiration in fully controlled plant factories.
1) Crop transpiration model In a plant factory, the internal evapotranspiration can be equal to the crop transpiration, as the evaporation of the surface soil is relatively small. The Bowen ratio energy balance method (BREB) can usually be selected to investigate the crop transpiration [18] , then the transfer can be represented by the following equations: where, λE is the latent heat consumed by crop canopy transpiration, W/m 2 ; The leaf area index (LAI) shows an important role to determine the energy balance. The LAI is defined as the ratio of leaf area divided by cultivation panel area, LAI=A p /A, AP is the area of leaves, m 2 ; A is the area of surface soil, m 2 ; R n is the total net radiation intensity of the crop canopy, W/m 2 ; k is the extinction coefficient, dimensionless; β is the Bowen ratio, dimensionless; λ is the thermometer constant; γ = 0.646 kPa/°C; t 1 , t 2 are the air temperature at the high position of 1 and 2, respectively, K; e 1 and e 2 are the water pressure at the high position of 1 and 2, respectively, kPa. According to Gaudriaan's study [6] , the equation can be written as 17 where, e * in is the saturated vapor pressure of the internal air, kPa; e in is the actual vapor pressure of the internal air, kPa; RH in is the internal relative humidity, %; e 0 is the saturated water pressure of the air at the temperature of 0°C, and e 0 =0.6107 kPa.
The actual water pressure of the air at the position of crop canopy e inp (kPa) can be written as 17 (12) where, e * inp is the saturated water pressure of the air at the crop canopy, kPa; t p is the temperature of the leaves at the crop canopy, °C. Then, the Bowen ratio β can be written as 17

2) Energy consumption of crop transpiration
The crop absorbs and emits radiation, while exchanges heat with air and transpire. Hence, the energy consumed by crop transpiration in a plant factory can be obtained as

Transfer of sensible heat between crop canopy and air
According to Fick's first diffusion law [24] , Equation (15) where, H is the sensible heat exchange between the crop canopy and the air, W/m 2 ; r a is the single aerodynamic resistance to the boundary layer of crop leaf, s/m. The value of aerodynamic resistance is taken as 100 s/m when the forced air circulation is working, whereas the value is set as 200 s/m when the forced air circulation is off [24] . Thus, the expression of sensible heat transfer between the crop canopy and air can be written as

Energy balance equation of artificial lighting source
Artificial lighting sources in plant factories mainly include agricultural high-pressure sodium lamps (HPS) and light-emitting diodes (LED). The HPS was used in this study.
Two parts can be allocated for the energy generated by artificial lighting sources. One is dissipated in the form of heating, and another is absorbed by the plant canopy in the form of radiation, while converted into the heat when contacting various surfaces, and then into the internal air via the convection in a plant factory.
The energy balance equation of artificial lighting source can be written as where, Q L is the energy output when the artificial light source works, W; τ is the power factor of the artificial lighting source, dimensionless; p is the power when a single artificial lighting source is used, W; n is the number of artificial lighting sources; t W is the operating time of artificial lighting source, h.
The mechanism model of the internal temperature control system in the plant factory can be established as Equation (18) In Equation (18), the following explanation can be clarified.
1) The temperature control system in the plant factory presents some typical features, including strong nonlinearity, strong coupling, and large time lag [19,20] .
Particularly, the strong interference and time-varying control are also the dominant characteristics in a plant factory, due to the variation in the air volume and temperature of temperature device, the type and working time of artificial lighting source, the types of crops grown in the greenhouse, and the planting pattern of the crops.
2) For the internal temperature, the main influencing factors are, the volume flow of supply air (q s ), the temperature of supply air (t s ), the power of artificial lighting source (p), the running time of artificial lighting sources (t W ), and the internal relative humidity (RH in ). Reversely, the internal relative humidity can also be affected by parameters q s , t s , p, t p . 3) There are obvious delays in the influence of the internal relative humidity on the internal temperature. The reason can be that the volume flow of supply air (q s ), and the temperature of supply air (t s ), first imposed on the internal temperature (t a ), and then the varied internal temperature can impose on the internal relative humidity.

Determination of model type
For modeling a temperature control system, it is necessary to comprehensively evaluate the influence of various environmental factors on temperature [19,38] , and thereby to select the dominant influence factors for the optimization of the model. There is a complicated mechanism in the greenhouse system, with emphasis on some environmental factors that change relatively slowly in a short-term evaluation.
According to the above-mentioned mechanism model, the relationships between the variables q s , t s , p, RH in and t a , and these variables' influences level on t a in temperature model were identified. Through taking these environmental factors including indoor humidity, supply air temperature, air volume flow as well as light into consideration, the changes of indoor temperature were predicted. Normally, the temperature control system can be assumed as a multiple-input single-output (MISO) system. The output parameter of the temperature control system can be set as the measured value of the internal temperature, whereas the input can be the volume flow of supply air (q s ), the temperature of supply air (t s ), the power of the artificial lighting source (p), and the internal relative humidity (RH in ). Based on the controllability of artificial lighting sources and the characteristics of forced circulation of the internal air within the closed system of the plant factory, the controlling equipment can be continually controlled and adjusted, and the input and output signals of various parameters always vary in constant impulse.
In this study, a system identification method based on the in-out observation was selected to accurately predict the temperature behavior in the plant factory. Specifically, the run data of a system was first used to identify the structure and parameters of the model, and then to establish a dynamic analysis model, finally to predict the temperature variation from its past trajectory in a plant factory [39] . Based on environmental data collected from the test, the correlation analysis is usually utilized to verify the feasibility of these selected variables for the temperature control system model.
Consequently, the prediction of temperature control systems at a certain time in the future can be gained, mostly starting from the data of influencing factors using various mathematical methods.
Random Forest Regression (RFR) can be seen as a kind of integrated machine learning method, which applied random resampling technology bootstrap and random splitting of nodes technology to construct multiple decision trees for obtaining the final classification results by voting. RFR has some advantages such as higher accuracy, the controlled generalization error, fast convergence speed as well a few adjustable parameters to effectively avoid overfitting. It is suitable for all kinds of operation of data sets, especially for ultra-high dimensional feature vector space. Meanwhile, RF has the ability to analyze complex interaction classification features, and it has good robustness for noise data and data with missing values. Hence, this paper applies Random Forest Algorithm for model identification and regression prediction.

CART model
The random forest can be defined as [40][41][42][43] , , θ is the random vector of each decision tree, h(X,θ) is the output value, k is the number of decision trees.
Only two parameters can be adjusted in a random forest approach, the number of trees in the forest, and the selected number of key split features for each tree. Under the law of large numbers (LLN), the random forest algorithm can represent high classification accuracy, while preventing overfitting during the training process.
In random forest, the classification and regression trees (CART) algorithm is a powerful and popular predictive machine learning technique. This learning method can be used to output the conditional probability distribution of the random variable Y, under the condition of the given input random variable X.
Suppose that X is the input vector, Y is the output vector, particularly, Y is a continuous variable, then a given training set can be written as The optimal value of c m on the element R M can be denoted by ˆm c which is the mean value of the output y i , corresponding to all input instances x i , on the above element R M , namely, ˆ() Using following equation to find the optimal segmentation variable j, and the optimal segmentation point s, 12

Analysis of random forest model 1) Convergence of random forest
Given a set of classifiers, h 1 (x), h 2 (x), ···, h k (x), the randomly selected training set for the distribution of vector, X can be obtained using the random vector, Y, where X is the input vector, while, Y is the output vector [44,45] . Consequently, the margin function for the defining sample points (x,y) can be written as, ( where, I() is the indicator function; av k is the operator of average value.
The margin of a data point is defined as the proportion of votes for the correct class minus the maximum proportion of votes for the other classes. Thus, under majority votes, the positive margin means correct classification and vice versa.
2) Generalization error of random forest In random forest, the generalization error, also called the generalization gap, can be defined as PE * (forest) = P X,Y (mg(X,Y)<0) (25) In random forest h k (X)=h(x, θ k ). When the size of a decision tree is large enough, the law of large numbers can be followed. In all sequences, θ 1 , θ 2 , · · · , θ k , PE * (forest) almost converges everywhere to , ( ( , ) ) max , 0  (26) It infers that the random forest cannot produce the over-fitting problem as the increase of decision trees, but it may produce the generalization error within a certain limit.
This can be expressed formally as a boundary (the maximum) on the generalization error as follows: where, S is the intensity of generalization ability; p is the average correlation coefficient.

Test conditions
A field experiment was carried out continuously in a fully enclosed plant greenhouse with artificial lighting from August 20, 2018 to September 10, 2019 in the Beidahuang Kenfeng Seed Industry Co., Ltd., Harbin province, China. The structure size of the greenhouse is 3.6 m×2.4 m×3.0 m. Four high-pressure sodium lamps (MASTER son-T plus 400W E40 SLU) were used for the artificial lighting. A compressor condensing unit served as the temperature-control device where the power of the condensing unit is 5.5 kW, the design pressure is 3.2 MPa (high-pressure side), and 1.9 MPa (low-pressure side), and the total nominal cooling capacity is 11.7 kW. A CO 2 gas tank was set to ensure the normal photosynthesis of crops.
During the test period, the environmental control facilities work continuously.
A type of cold field maize, Demeiya No.1, was set as research materials, produced by Beidahuang Kenfeng Seed Industry Co., Ltd. Harbin City, Heilongjiang, province, China. Before sowing, the treatment can be made on the maize seeds, including screening, disinfecting, and coating. The seedling time of maize was set from BBCH-scale 00~13-15, i.e., from sowing to 3-6 leaf stage in the growing phase of maize [46,47] . A small-scale cultivation platform 0.2 m above the ground was designed, on which some pots with the black calcium soil were placed in the form of a single layer planting. In each time, 16 pots were used for sowing, while two maize seeds in each pot were sown to ensure the seedling emergence rate. After the emergence of maize, the weak seedling was manually removed, whereas the strong one was retained.
The specific environment settings in the greenhouse were that the indoor temperature of 18°C-26°C, the CO 2 concentration remained constant at 440 µmol/mol, the internal relative humidity (RH in ) of 60%-80%, the power of artificial lighting source of 400 W (single lighting source). The working time was 12 h/d, where the opening time of lighting source was from 20:00 to 8:00 the next day, whereas, the closed time was daily from 8:00 to 20:00. A wireless sensor networks (WSN) monitoring system was used to automatically collect data, while real-time regulate the microclimate in the greenhouse. The collected environmental data were sent to the monitoring computer, sampling and recording once per minute.

Data acquisition 4.2.1 Measurement of canopy temperature and radiation intensity
The vertical distance between the artificial lighting source and crop canopy was 1.5 m, and remained the same during the whole measurement. A handheld spectrometer (UPRTEK MK-350N) was used to measure the illuminance and photosynthetic photon flux density (PPFD) of the crop canopy. PPFD is defined as the number of micromoles of light that hit any given square meter of the crop each second [48] . When the artificial lighting source was running continuously to a smooth state, the illuminance at the canopy was 34542 lx, and the PPFD (400-700 nm) was 520 μmol/(m 2 · s), as shown in Figure 3. A sensor transmitter (RSTONG GT20) for temperature and humidity was installed at the height of the crop canopy to measure the canopy temperature.

Measurement of indoor temperature and humidity
At the center of the indoor ground, a temperature and humidity sensor transmitter (RSTONG GT20) was installed 1.5 above the ground level, to measure the indoor temperature and humidity in a greenhouse.

Measurement of volume and temperature of air supply
There were two square vents in the size of 400 mm×400 mm, symmetrically arranged in the wall 2 m above the ground level. An air flow detector (FLUKE 922) was placed in the center of each vent, to measure the volume flow and temperature of the air at the outlet in the greenhouse.

Determination of leaf area
A high-throughput plant imaging system (Scanalyzer HTS) was used to measure the leaf area of canopy crops when the maize seedlings grew to the stage of three leaves and one heart. This image system was developed by Beidahuang Kenfeng Seed Industry Co., Ltd., Heilongjiang Province, China [49,50] . Two parts can be divided for the measuring procedure, one was the image recognition for the leaf area of each selected seedling, another is to sum the leaf area of all measured seedlings. Thus, the leaf area of crop canopy can be obtained, as shown in Figure 4. a.
b. Figure 4 Scanalyzer HTS a) near-infrared imaging and b) software interface of data acquirement 5 Results and discussion Table 1 lists small parts of acquired system data extracted from a total of 30 000 sets of measured sampling data. Each set captured in each 60 s from 9:21 on August 20, 2018 to 5:25 on September 10, 2018. The data processing of the test was completed through Python software. The prediction of Random Forest Regression was achieved through the sklearn module. 75% of all data was randomly selected as the training set to train a prediction model using the random forest algorithm (CART). The rest of the measured data was set as the test set.

Standard error analysis
The mean square error (MSE) was selected to evaluate the proposed prediction model. The statistical parameters can be set as the sum of squares, and the mean for the standard error at the corresponding point of prediction and the raw data. The equation can be written as y y n    (28) where, MSE is the mean square error; y i is the raw data; ˆi y is the predicted data; n is the number of prediction points. Generally, the smaller the mean square error is, the more accurate the prediction of the model is.

Selection of parameters
Based on the acquired data sets, four parameters, M=4, were selected to predict the indoor temperature in the greenhouse, including the volume flow of supply air q s , the temperature of supply air t s , the internal relative humidity RH in , and the state of artificial lighting (open, p=1, close, p=0). When the number of decision trees in the random forest is N, the mean square error of the random forest can achieve the minimum, as shown in Table 2.

Evaluation of prediction results
(1) Generalization error of model To verify the prediction accuracy of the proposed model, 20 samples were randomly selected from all test samples, to compare the actual values of parameters with the predicted values. The selection of the verification approach was because, in the proposed prediction system, the variation of time was not involved in the predicted temperature data, with emphasis on the processing parameters, including the internal relative humidity, the temperature of supply air, the volume flow of supply air, and with/without artificial lighting.
In the test data set in the whole day, the sampling data at 20 moments were randomly selected for numbering, and thereby obtaining the actual and predicted values of the indoor temperature, as shown in Table 3. The comparison results of the actual and predicted internal temperature values as shown in Figure 5. In the randomly selected training data, the smaller distance between the real and predicted values will make higher accuracy for the model prediction, as compared with Table 3 and Figure 5. Artificial lighting sources generally display a periodical variation trend in plant factories. In this study, an individual investigation has been made on the sample data with/without the lighting conditions (p=1 or p=0), and a continuously lighting state under the whole day. In three cases, 75% of the total data under the different conditions was used to establish the model, whereas, the rest of the data were used to generate the prediction for the evaluation of accuracy. Consequently, the generalization error of the three models can be achieved, as shown in Table 4.  The generalization errors of the model were 0.1135, 0.0761, and 0.0907, respectively, for the case with/without artificial lighting (p=1/p=0), and continuously lighting in a whole day, indicating that the predictions of the three models were accurate. Specifically, the prediction accuracy of the case without artificial lighting (p=1) was the highest, whereas, that of the case with artificial lighting was the lowest. It can be infered that the usage of artificial lighting source can increase the power consumption of the corresponding control equipment in the greenhouse, and thereby impose the change of indoor temperature. The reason is that the lighting source can inevitably radiate the heat energy to the surroundings, although it has only a little effect on the model prediction.
(2) Weight of parameters Based on the accurate model prediction, it is necessary to determine the weight value of various parameters influencing the indoor temperature, in order to avoid the effects of the multicollinearity between different attributes on the prediction processing. Thus, the weight value of different parameters can be obtained using the above-mentioned prediction model and the parameters of training samples, in the three cases, with/without artificial lighting (p=1/p=0), and continuously lighting in a whole day, respectively. Figures 6-8 show the weight values of four parameters, including the volume flow of supply air (q s ), the temperature of supply air (t s ), and the internal relative humidity (RH in ) under the three conditions. Figure 6 Weight value of the volume flow of supply air q s , the temperature of supply air t s , the internal relative humidity RH in , and lighting, under the continuously lighting for a whole day Figure 7 Weight value of the volume flow of supply air q s , the temperature of supply air t s , the internal relative humidity RH in , with artificial lighting (p=1) Figure 8 Weight value of the volume flow of supply air q s , the temperature of supply air t s , the internal relative humidity RH in , without artificial lighting (p=0) In Figure 6, the weight values for t s , q s , RH in , and the lighting source were 0.4889, 0.4778, 0.0283, and 0.005, respectively, under the continuously lighting for a whole day in the greenhouse.
In Figure 7, the weight values for t s , q s , and RH in were 0.4306, 0.5214, and 0.0480, respectively, with the artificial lighting (p=1) in the greenhouse.
In Figure 8, the weight values for t s , q s , and RH in were 0.4669, 0.5142, and 0.0189, respectively, without the artificial lighting (p=1) in the greenhouse.
It can be infered that the temperature of supply air t s , and the volume flow of supply air q s , demonstrated large effects on the prediction of internal temperature, whether with or without the artificial lighting in the greenhouse, whereas, the internal relative humidity RH in , can also affect the predicted value, although the weight value was relatively small.
Compared with the observation in the experiments, the main parameters influencing the internal temperature can be the temperature of supply air t s , the volume flow of supply air q s , the internal relative humidity RH in , indicating the basically consistent with the analysis of the mechanism model. Although the artificial lighting sources have only little effect on model predictions, the radiated heat energy can significantly increase the power consumption of control equipment in the greenhouse, thereby affecting the changes of the indoor temperature. The value of leaf area index LAI was relatively small, indicating the latent heat of transpiration was relatively lower, compared with the overall power consumption in the greenhouse. This trend can be possibly due to the small number of maize seedlings were planted in this experiment. Furthermore, there was a relatively small effect of the internal relative humidity RH in , on the predicted temperature, and an obvious lag also occurred.
From the analysis above, the temperature mechanism model in plant factories (PFLAs) was constructed based on the modeling of a typical greenhouse system, using the energy balance theory, in order to obtain the disturbance input affecting the internal temperature in the system. The feasibility of the proposed model was verified by the test method.
Similar results can be found in the research from Meng et al. [14] in which they proposed to construct the simulation model of thermal environment in solar greenhouses, based on the law of mass-energy transfer in a greenhouse, in order to comprehensively consider the effects of several parameters, including the meteorological conditions outside the greenhouse, covering materials of greenhouses, surface soil, the crops and ventilation inside the greenhouse, crop evaporation and phase change of water vapor, on the internal temperature in the greenhouse. Alternatively, the study from Luuk et al. [24] also showed that the energy distribution of sensible and latent heat, and crop transpiration model described by a mechanism model of plant factories. The research of Qin et al. [19] also showed that the correlation between perturbation input. The temperature/humidity in the greenhouse directly determined the identification and simulation of parameters in the system model. Jiang et al. [20] investigated various factors influencing the dynamic system of the humidity in a greenhouse, and thereby obtaining the microclimate model via the experiment simulation, indicating that the testing method can be used to ensure the reliability of the numerical model. Compared with the selection of the linear static approach in the local domain to approximate the non-linear dynamic model, proposed by Li et al. [10] , the random forest algorithm in this study can be well used to identify and predict the regression of the dynamic model, indicating higher prediction accuracy, while better tracking the model change, and the strong applicability. The method in two previous studies can be consistent with this study. Wu et al. [43] adopted the tuning features of the generalization error in random forest algorithm to improve the accuracy of prediction. Wang et al. [27] employed the random forest algorithm to sort the weight values of four basic characteristic variables. Previous experiments also found that the heat dissipation of high-pressure sodium lamps can be greater, thereby increasing the power consumption of environmental control equipment in plant factories. Therefore, the rational selection for the type of artificial lighting source is benefit to reduce the power consumption of plant factories during the growth period of crops.

Conclusions
Currently, various greenhouse models are mostly based on a typical greenhouse system model. A high-performance system model is lacking for the optimal control of plant factories. This makes it difficult to be applied in the actual plant factories. In this paper, the mechanism modeling of temperature factors affecting the microclimate of a plant factory has been carried out, to obtain a complex model with perfect mechanism and high accuracy, with emphasis on including the dynamics of the actuator. The necessary simplification and simulation prediction of the complex model have been made to provide a sound reference for the system analysis and controller design for a plant factory. The following conclusions were drawn.
1) Based on the energy balance theory, a temperature mechanism model was established including the dynamic operation of the actuator for a plant factory, and further, the simplified model can ensure the real-time control process.
2) The random forest algorithm was used for the model identification, simulation, and regression prediction.
The mechanism model analysis and the prediction model can achieve basically consistent data, indicating that the reliability of the model was verified by the test method. The main factors that affect the internal temperature can be the volume flow, the temperature of supply air, and the internal relative humidity in a plant factory. Specifically, the weight values of the three parameters were 0.4889, 0.4778, and 0.0283, respectively, whereas, the generalization error of the prediction model was 0.0907.
3) The practical significance of this study is to clarify the modeling method for temperature mechanism in a control-oriented plant factory. This finding can offer a promising insightful approach to establish a model suitable for the system control of plant factories based on the previous models, and further to guide the subsequent research direction in the near future, particularly for the accurate prediction of the internal temperature in a plant factory in modern agriculture.