Research status and applications of nature-inspired algorithms for agri-food production

: Nature-inspired algorithms have been developed with biological mimicking. Machine learning algorithms from artificial neurons and artificial neural networks have been developed to mimic the human brain with synthetic neurons. This research can be traced back to the 1940s and has been expanded to agri-food problem solving in the last three decades. Now, the research and applications have entered the stage of deep learning with more layers and neurons that have complex connections to extract deep features of the target. In this paper, the developments of artificial neural networks and deep learning algorithms are presented and discussed in conjunction with their biological connections for agri-food applications. The related independent studies previously conducted by the author are summarized with the newly conducted being presented. At the same time, the algorithms motivated by recent bionics studies are compared and discussed for their potentials for agri-food production.


Introduction 
Agriculture is the source of food. A supply chain is formed from agriculture to food to become increasingly important to the world with diminishing resources and ever-increasing population. To tackle the issues to ensure global food security, it is necessary to develop and apply advanced technologies such as artificial intelligence (AI) and nature-inspired computing in agricultural and food engineering and sciences.
Nature-inspired computing associated algorithms have great potential to renovate the agriculture and food industry.
In the past, especially in the last decade, various new nature-inspired systems have been developed through the studies of nature-inspired algorithms with biological mimicking that have had further advances for bionics engineering [1][2][3][4] . In AI, machine learning (ML) algorithms from artificial neural networks (ANNs) have been developed to mimic the human brain with synthetic neurons [5] . The development of ANNs can be traced back to the 1940s and ANNs have been widely studied and applied to solve problems in various areas [6][7][8][9][10][11] . In the last thirty years, ANNs and other associated soft computing methods from ML have been expanded to solve problems in agriculture [12][13][14] . In recent years, research and applications of ML have entered the stage of deep learning (DL) with complex connections through multiple layers from and to various neurons [15,16] . DL algorithms are used to extract deep features of the target with high accuracy and robust system performance. Applications of DL in agriculture have appeared recently [17][18][19][20][21][22] . In the meantime, by being motivated from recent advanced bionics studies [23] , a number of 2 Machine learning and artificial intelligence ML used to be not strictly categorized as a branch in "classic" AI. With the appearance of DL [15] , especially the success of AlphaGo with deep neural networks [16] , ML, including DL, becomes an overwhelming branch of AI. In contrast to the natural intelligence of humans and other animals, AI provides intelligence shown by machines or computerized systems with the functions of language and vision. AI, as a scientific discipline, began to develop in the 1950s as Alan Turing proposed a test called "The Imitation Game" [29] . After twenty years Terry Winogard first operationalized Turing's intelligent machine by creating a blocks world with a natural language understanding computer program, SHRDLU [30] .
Experienced several waves of optimism and disappointment with "spring" and "winter", AI finally started a new age with ML in the 1980s while before ML, all AI systems worked with hand-designed rules, i.e. man-made rules, which was hard to anticipate all possibilities to adapt to new situations in an assumed closed world. ML develops and uses statistical techniques with the methods of pattern recognition and computational learning theory to allow computerized systems to be able to "learn", e.g. the ability to progressively improve the system performance in solving a specific problem with data without being explicitly modeled or programmed [31,32] .

Artificial neural networks and soft computing
Obviously, compared with hand-designed rule AI systems, ML offers an "open-world" scheme to design and develop new-generation AI systems. The ideas of the open world with learning ability were motivated biologically by human brains. With this motivation, ML was developed into the nature-inspired paradigms or algorithms represented by ANNs.
The development of ANNs started designing the artificial neuron to mimic the characteristics of the biological neuron. The human nervous system is built of cells called neurons. Figure 1 shows the structure of a pair of typical biological neurons. In the structure, dendrites extend from the cell body of one neuron to the cell body of the other neuron and other neurons, where they neurons receive signals through synapse at the connection point. On the receiving side of the synapse, these inputs are conducted to the cell body, where the inputs are summed, some of which are tending to excite the cell while others to inhibit it to fire. When the cumulative excitation in the cell body exceeds a threshold, the cell fires and sends a signal down the axon to other neurons. Figure 1 Biological neuron [33] Of course, the description above is just the basic function of biological neurons.
The real neurons work with many complexities and exceptions. ANNs only simplifies and models the basic functions. Figure 2 shows the structure and calculation of the artificial neuron to mimic the first-order characteristic of the biological neuron. In the artificial neuron a set of inputs, x 1 , x 2 , · · · , x n , are applied, each of which represents the output of another neuron. Then, each of the inputs, x i is multiplied by its corresponding weight, w i , which is analogous to a synaptic strength, and all the multiplications i.e. the weighted inputs, w i* x i , are summed to determine the activation level of the neuron. With the weighted sum the output of the neuron, z, is produced by an activation function. The function can be a simple linear threshold function. In order to more accurately simulate the nonlinear transfer characteristics of the biological neuron a number of nonlinear activation functions were used for artificial neurons. Among them the sigmoidal function has been mostly used, which is expressed as: Figure 2 Artificial neuron A group of artificial neurons are connected in a way to mimic the behavior of biological neurons in the human brain with the web of connectivity and interactivity, which formulates so-called artificial neural network (ANN). There are different ANNs with different connectivity of neurons. Figure 3 shows feedforward, recurrent and feedback, fully connected, auto-associative and hetero associative ANNs. Among them, the feedforward ANN is the most widely used in pattern recognition. In 1986 a series of results about backpropagation (BP) training algorithm for multilayer feedforward ANN was published [34,35] . In 1989 the multilayer feedforward ANN with one hidden layer was proven as the universal approximator for any continuous function [36,37] . These works lead to a wave of machine learning based on statistical models. Studies indicated that the BP algorithm could train a feedforward ANN model to learn from a quite amount of data samples to extrapolate or predict unknown events. People found that this statistics-based machine learning method has a lot of advantages over the man-made rule-based systems. The BP algorithm is a supervised learning algorithm. That is that the net optimization is based on the known desired output. However, in reality, the desired output is often unknown and the system optimization is based on self-organizing. The training algorithms in this category without known desired output are unsupervised learning algorithms, which are more and more developed and used although in 1982 Kohonen already established the self-organizing feature mapping network [38] .
With the success of the BP neural networks, radial basis function networks and support vector machine (SVM) appeared to be developed and used widely with great success as well [39,40] . Soft computing is a computational approach to learning and machine intelligence [41] . It differs from conventional (hard) computing in that it is tolerant of imprecision, uncertainty, partial truth, and approximation.
In the 1980s various rule-based expert systems were developed to bring up a wave of AI research and development. Examples are the expert systems for crop fertilization and economic forecasting method selection [42] . In the early 1990s, Kohonon self-organizing feature mapping network was originally developed and applied for unsupervised ultrasonic signal classification for beef grading [43] and accordingly a counter propagation network to model a particleboard manufacturing process [44,45] . A Ph.D. research developed multi-layer feedforward neural networks with backpropagation training algorithm to identify the multiple input and multiple output relationship of a snack food frying process unit operation and recurrent neural network with backpropagation through time training algorithm to characterize the dynamics between the inputs and outputs of the unit operation, and based on modeling the neural network process models were inversed through numerical optimization to design and implement model predictive controllers to handle the nonlinearity and input-output time lags of the process [46][47][48][49][50] . Figure 4 shows the closed loop of neural network modeling and control for the snack food frying process unit operation. In the late 1990s for meat quality evaluation wavelet textural features were developed for quantitative ultrasonic elastographic image analysis [49,51] . With the wavelet textural features multilayer feedforward neural networks were developed through investigating the efficiency of the training processes and the generalization of the networks using the gradient descent and Levenberg-Marquardt optimization algorithms in backpropagation and weight-decay was added in the Levenberg-Marquardt backpropagation to improve the generation of the neural network models [49,52] . In the late 2000s, all fundamental and associated ANN architectures and training algorithms were reviewed and the further ANN development related to support vector machine (SVM) was discussed in conjunction with applications in food science and engineering, soil and water relationship for crop management, and decision support for precision agriculture [12] . Then, ANNs were put into consideration as the major force of soft computing to emulate the human mind along with fuzzy logic, genetic algorithms, Bayesian inference, and decision tree [13] . In 2013 a group of scientists discussed challenges and issues in conducting agroecological studies from a statistical point of view, including neural networks [53] .
yprocess output vector; ŷone-step ahead or multi-step ahead process output prediction vector; y sprocess output reference vector; ûinversed process input vector from controller In recent years, a study was conceived and initialized a study using machine learning algorithms of naive Bayes, random forest and SVM to assess soybean injury from dicamba, an herbicide used to control broadleaf weeds in crop fields, through hyperspectral imaging [54] . Studies were conducted using machine learning algorithms of K-nearest neighbor, random forest, and a genetic algorithm coupled with an SVM to create a spectral library to enhance crop classification and growth status monitoring [55] , using K-nearest neighbor and SVM for classification of broilers to analyze their behaviors [56] , and using SVM classification of unmanned aerial vehicle (UAV) color images to monitor cotton budding [57] .

Deep learning
ANNs before the 2000s can be tentatively categorized as shallow learning in machine learning. Then, the most successful ANNs are the ones with the architecture of multilayer feedforward and the supervised training algorithms of BP. In 2006 Hinton and Salakhutdinov started the concepts of DL by illustrating that the ANNs with many hidden layers have strong ability of feature learning and the difficulty of deep ANNs' training can be overcome by layer-wise pre-training [15] . Although the previous ANNs mostly succeeded with supervised training, the layer-wise pre-training of Hinton and Salakhutdinov's work was conducted through unsupervised learning. However, DL did not gain its popularity until AlphaGo was announced [16] and beat a number of top Go players in the world. From shallow ANNs to deep ANNs the network structure becomes much more complicated with a lot of more layers and neurons ( Figure 5). Also, deep ANNs provide their ability to learn data representations mostly in an unsupervised manner and generalize to unseen data samples using hierarchical representations.
Deep ANNs are leading another wave of machine learning to advance AI technology. In the past few years, DL has been rapidly studied, developed and applied [58] . There are a number of DL models such as Deep Belief Network [59] , Convolutional Neural Network (CNN) [60] , and Stacked Autoencoder [61] . CNNs have transformation invariance in translation, angle of view, size, or illuminance so that they are widely used in pattern recognition and image analysis. CNNs are biologically inspired variants of multilayer perceptrons. They are designed to emulate the behavior of the visual cortex. The CNN models mitigate the challenges posed by the multilayer feedforward architecture by exploiting the strong spatially local correlation present in natural targets and images.
With these characteristics, CNNs are the most widely used deep ANNs so far. LeCun et al. (1989) first proposed a CNN [62] . This CNN was improved for hand-writing character recognition [63,64] .
Until the appearance of AlexNet [55] , deep CNNs began rapid development in theoretical studies and practical applications. It is noted that this CNN uses a new activation function to reduce the computation, speed up training convergence and mitigate overfitting [65] : where, ReLU stands for rectified linear unit and its derivative is the sign function: AlexNet was modified and improved with ZFNet [66] and GoogLeNet [67] , VGG [68] , residual network [69] and their variants advanced greatly deep CNN techniques. Currently, the studies are being conducted to improve deep CNNs and optimize their training processes in the convolution layer, pooling layer, activation function, loss function, network architecture and data regularization with the structure of a typical CNN as shown in Figure 6. and fully-connected layers [70] In recent years more and more journal articles on DL have been published. Figure 7 shows the journal publications in the world on DL up to 2015.
In 2017 AlphaGo Zero was announced [71] by advancing AlphaGo by learning from scratch to totally beat AlphaGo which strongly depends on prior human knowledge, which preludes a new wave of deep learning research, development and applications. The web network structures of deep ANNs need a graphic processing unit (GPU) specialized for high-performance computing. With the rapid development of DL and AI, a number of GPU-based DL computing frameworks have been created for high-performance DL system computing and operation. The examples include TensorFlow (Google), Theano (Université de Montré al), PyTorch (Facebook), Torch (New York University/Facebook) and Caffe (University of California, Berkeley). The tensor processing unit (TPU) is a chip designed by Google for machine learning workloads. Compared to GPU TPU has faster speed and much more memory. Actually, TPU, GPU and even CPU all can be used for deep learning. However, which DL models fit each of TPU, GPU and CPU, respectively, is an issue to consider. Wang et al. (2019) [72] designed a type of software named ParaDnn for benchmark testing of DL parametrization to help determine TPU, GPU, or CPU for different DL models.
The success of DL is built on a great amount of data and the state-of-the-art supercomputing power allows training of scalable large neural networks for better performance with more data than shallow ANNs in ML that will stay with more data. However, the amount of data in reality often could not match the algorithm to be successful. In the case of limited data more data have to be created by augmenting the limited data, for example, turn, shift, scale, rotate and cut an image. Data augmentation can be done offline for relatively small data sets and online dynamically in the computing program for relatively large data sets.
With the development of ML new models have been developed to improve and enhance general ML and DL training and analysis. Examples are the general adversarial network (GAN) [73] , deep transfer learning [74] , and AutoML [75] .
DL has been widely developed and applied in agriculture to expect the improved performance of monitoring, estimation and analysis. Various DL architectures and models have been used in agriculture and CNN is dominated in most of the research and applications [19,20,76] .
Deep CNN was tested for classifying the images of corn, cotton and soybean leaves collected in the fields. Nine leaves of each crop were collected from the fields and half of them were kept in coolers and half were left in the lab room, respectively, within 1 d after the field collection and the images were taken with a portable digital camera with the time within 24 h at 15 m, 1 h, 2 h, 3 h, 4 h, 5 h, 6 h, and 24 h, which generated 144 images of each crop. Figure 8 shows two representative leaf images of each crop over the time within 24 h. The purpose of the test was to classify the leaves regardless of the time during 24 h. In the project, a CNN was created ( Figure 9 and Table 1)

using TensorFlow in a Python
NumPy program (NumPy_CNN.py calls TensorFlow_leaves.py) to conduct leaf classification. With data augmentation, each crop was doubled with the images, half for training and a half for testing of the CNN network. Figure 10 illustrates the implementation of the CNN for leaf classification. Table 2 is the confusion matrix of CNN testing results for leaf classification, which indicates that the CNN can classify the leaves of the three crops regardless of the time during 24 h when the vigor of the leaves decayed in general but the model still can be adjusted to further improve the classification, which may indicate that straight use of CNN might not be suitable for this problem, and a more suitable DL scheme may be needed. At present CNN almost becomes the synonym of DL for new people in this field. A lot of applications claim the use of DL to simply apply CNN. However, the straight use of CNN often cannot achieve what expected from DL for a lot of problems. When solving a problem, the specific characteristics of the problem should be carefully identified and analyzed. On this basis, a "use-inspired" DL approach can be developed to seek a most suitable method and understanding in DL by situating the research in a domain of application to simultaneously inform progress in DL and solve problems in particular use cases. This is what this research doing to advance the DL research to solve this specific problem and other similar problems at the same time to not only classify the leaves regardless of the time but also further detect the change of the leaves with time with deep feature extraction to formulate the time series of the leaf images.
Recently CNNs were evaluated for cage-free floor egg detection [77] .
This study developed vision-based floor-egg detectors using three variants of CNN, single shot detector, faster region-based CNN (faster R-CNN), and region-based fully convolutional network (R-FCN), which have been widely used for object detection and recognition [78,79] , and the three detectors were evaluated their performance on floor egg detection under simulated cage-free environments.

Bionics motivated algorithms
Bionics is a scientific discipline that investigates to apply nature-inspired biological methods and systems to the study and design of advanced technology and engineering systems. The word, Bionics, was coined by Dr. Jack E. Steele in 1958 with meaning "like life" when he was working at the Aeronautics Division House at Wright-Patterson Air Force Base in Dayton, Ohio. Bionics engineering is to develop and implement the advanced technology and engineering systems from bionics studies. Classic examples of bionics in engineering include sonar, radar, and ultrasound imaging imitating animal echolocation.
With the development of bionics more and more algorithms have been developed for complex computational applications by getting ideas from observing how nature behaves to solve complex problems. Although compared to ML the research on designing and developing nature-inspired algorithms is still very young, there are some successful nature inspired computing and complex systems for understanding and designing more such systems with novelty in AI. Nature-inspired algorithms are principal among metaheuristic algorithms that are found to be more powerful than the conventional methods that are based on formal logic or mathematical programming [80,81] .
Ganomi and Alavi (2012) [82] illustrated that a group of algorithms inspired by biology have been developed and are divided into three main categories [83] : (1) Evolutionary algorithms, (2) Swarm intelligence, and (3) Bacterial foraging algorithms. The evolutionary algorithms are inspired by the genetic evolution process. Among them, a genetic algorithm (GA) [84] , as mentioned above as a soft computing technique, is the most used one. Others include genetic programming (GP) [85] , evolutionary strategy (ES) [86] and differential evolution (DE) [87] .
These population-based stochastic search algorithms work with best-to-survive criteria to optimize. The evolutionary algorithms have been remarkably improved over the last decades. Examples are Stud genetic algorithm (SGA) [88] and multi-stage genetic programming [82] for improved GP non-linear system modeling. In 2008, Simon [89] has proposed a new evolutionary algorithm, namely biogeography-based optimization (BBO). The BBO algorithm is used for global recombination and uniform crossover which are inspired by the GA literature.
In swarm intelligence particle swarm optimization (PSO) [90] and ant colony optimization (ACO) [91] are well known. These algorithms are based on the simulation of the collective behavior of animals.
PSO is a population-based method inspired by the social behavior of bird flocking or fish schooling. The ACO algorithm is inspired by the collective foraging behavior of ants.
The bacterial foraging algorithms emulate the bacterial foraging behavior for new bio-inspired optimization approaches [83,92] . Examples are computing systems of microbial interactions and communications (COSMIC) [93] and rule-based bacterial modeling (RUBAM) [94] .
However, these nature-inspired metaheuristic methods have the problem of the parameter adaptation. Valdez [3] surveyed mainly the PSO, Gravitational Search (GS), and ACO algorithms for modifications using fuzzy logic to solve this problem and obtain better results than the original methods. A study was conducted to use a multi-layer perceptron and PSO in modeling and predicting the germination rate of two common bean cultivars as a function of distinct temperatures [95] .

Agricultural perspectives of the discussed algorithms
ANNs and associated soft computing techniques have been widely used in crop production management, irrigation management, soil analysis, precision agricultural system integration and pesticide application control [13] . Deep learning has been developed for agriculture uses, especially for identification of weeds, land cover classification, plant recognition, fruits counting and crop type classification [20] . Metaheuristic algorithms have been developed for agricultural land use optimization in economic crop planning, water resources management, nature conservation in the landscape, and multifunctional agricultural landscape using genetic algorithms, PSO, ACO, etc. [98] . It can be believed that with the requirement to advanced agricultural operations into intelligent and automatic smart stages nature-inspired algorithms will play more and more important roles for sustainable agri-food production.
However, quite often complicated methods do not necessarily work better. Each application has different characteristics and each algorithm has its own limitation no exception of naturally-inspired algorithms. For example, DL is limited in high data and computing requirements and poor interpretability. In practice, the selection of data analysis methods should be based on the characteristics of data and application. Confirmation of the validity of the results is essential to all methods. Any research that proposes to use naturally-inspired algorithms requires sufficient justification for compelling reasons to use these algorithms over other methods and conducts a benchmark comparison to show naturally-inspired algorithms are better; otherwise, this research is only one-sided and biased and hard to make the sense in practical application.

Conclusions
In this study, the research status and applications of nature-inspired algorithms, including ML algorithms from ANNs to DL and the algorithms motivated by bionics studies, are presented, summarized and discussed in conjunction with their biological connections for agri-food systems applications. Within the context, with recent progress reports, classic literature are also included and analyzed to provide insight into the roots of the technologies to capture the intricate nature of the technologies.
With the coming spring of AI, it can be believed that ML, especially DL algorithms, will be greatly studied, developed and applied for solving problems in various areas of interests, including in agri-food system analysis for decision support. Agri-food system environment and management are dominated by uncertainty with complex interactions of various uncontrollable factors. This paper is expected to offer a headstart for scientists and engineers not only to use current science-based nature-inspired algorithms but also to develop use-inspired algorithms with the problems in agri-food areas to deal with those issues conventional approaches cannot solve well. [References]