Prediction of protein content in rice using a near-infrared imaging system as diagnostic technique

: The aim of this research was to determine the rice protein content utilizing a NIR imaging system. The developed imaging system utilized a NIR camera which installed automatically exchanged filters with the wavelength range from 870 nm to 1014 nm. Multiple liner regression (MLR), partial least square regression (PLSR), and artificial neural network (ANN) models were employed as data analysis methods for 6.18%-9.43% rice protein detections within both the NIR imaging system and commercial NIRS. A total of 180 rice samples were used in this study, of which 120 random samples were selected as a calibration set for the MLR and PLSR models. Moreover, for establishing the back-propagation ANN model, the same 120 samples were divided into two parts, 80 samples were used for network training and the other 40 were established as the monitoring set. To compare with the results of MLR, PLSR, and ANN models, the remaining 60 of the total 180 samples were established as the validation set. Applying an MLR linear regression model composed of five wavelengths; the NIR imaging system successfully detected rice protein content. The predicting results of r val 2 and SEP were 0.769 and 0.294%, respectively. In PLSR model, utilizing the imaging system obtained the results of r val 2 = 0.782, and SEP = 0.274% within the wavelength range from 870 nm to 1014 nm. Five significant wavelengths selected by the MLR model were the same as the input data of the ANN model, and the prediction results were r val 2 = 0.806, and SEP = 0.266%. The prediction results indicated that the developed NIR imaging system has the advantages of simple, convenient operation, and high detection accuracy as well as it presents commercial potential in non-destructive detection of rice protein content. of rice using a near-infrared imaging system as diagnostic technique.


Introduction 
Rice can supply abundant nutrition and plays as a key food in Asia. The flavor of cooked rice obviously influences rice price, therefore it is an important issue to reveal factors that determine the rice qualities. Reports [1,2] indicated that the main factors that affects the flavors of cooked rice were moisture, protein, starch, and fatty acid content of rice. Protein is the major element that determines the rice nutrition value [3] . Moreover, rice protein can prevent hyperlipidemia in part through modifying glutathione metabolism [4] . The rice protein content differs from 6% to 14% depending on the rice varieties and culture environments. Rice contain higher protein content are generally with color of yellowish-brown along with more transparent and harder grains, which should spend longer time and need more water to cook [5] . The rice protein content is negatively correlated with the flavor of cooked rice, but positively correlated with the viscosity of cooked rice. Protein content affects the amount of rice water absorption during the early cooking stage. In the rice cooking process, proteins in milled-rice grains will retard starch swell. Therefore, less rice protein content leads to more cooked-rice elasticity. In contrast, higher rice protein increases the viscosity of cooked-rice which results in flavor reduction [6] .
For high-quality rice production, topics including new rice variety development, culture technique improvement and harvested rice storage are all important for rice industry. Utilizing the destructive chemical analysis method like Kjeldahl method [7] could measure the rice protein content accurately, but such chemical analysis method usually suffers from problems of environmental pollution and time-consuming. Therefore, developing a non-destructive detection technique and device which can be utilized in rice processing and classification is an important issue for rice industry. The near-infrared spectrometer (NIRS) has been widely applied to numerous qualitative and quantitative analysis of crops. Rice amylose content has been predicted accurately in milled whole grain [8,9] and rice grain [10] . Moreover, rice protein content [11][12][13][14] and amino acid content [15] can also be estimated by NIRS. Parametric and nonparametric regressions have been used for the evaluating of predictive performance of different models [16] . Yang, et al. [17] established a multiple linear regression (MLR) model to predict rice protein content with a results of R 2 =0.9480. Li and Shaw [18] indicated that partial least square regression (PLSR) was the best model for fatty acid measurement of rough rice. Delwiche [8] utilized a PLSR calibration model to analyze the amylose and protein content of ground milled rice. Some reports illustrated that the nonlinear regression of artificial neural network (ANN) model showed a higher prediction capacity than linear ones. Sitakalin and Meullenet [19] reported that the accuracy achieved by artificial neural network (ANN) model was superior to that of PLSR model in texture prediction of cooked rice. For assessment of apple soluble solids content, the model of ANN with two hidden neurons performed a better predictive capacity than that of PCR [20] .
Imaging technique belongs to morphological method and has been applied into a useful nondestructive measurement approach in rice industry. Studies evaluated the visual features of rice had been reported, for example, the rice classification [21][22][23] , the degree of rice milling [24] , the cracked rice detection [25] , and the cooking properties of rice kernels [26,27] . Utilizing the NIR cooled charged couple device (CCD) camera could help to capture the spectral images in the NIR band, which made the quantitative measurements possible, e.g., insects inside wheat detection [28] , wheat protein and color classification [29] , and rice seed cultivar identification [30] . Cogdill et al. [31] developed a NIR imaging device that usefully detected protein contents of single maize. In previous study, the developed NIR imaging system showed a high detection accuracy and capability for the rice moisture [32] . Utilizing the MLR, PLSR and ANN models, the analysis result of r val 2 was within 0.942-0.952, which displayed satisfied prediction capacities for rice moisture. Protein content plays a major role in rice qualities, and it is important ant valuable to develop an effective and low-cost NIR imaging system for rice protein content measurement. To achieve progress in rice selection efficiency, the study was aimed to develop an NIR imaging system to detect 10 different varieties rice protein contents.
Moreover, the prediction efficiency of the system was evaluated and analyzed by calibration models of MLR, PLSR, and ANN.

Rice samples
The rice samples were harvested from four villages in central Taiwan, and 10 of the most popular varieties were chosen. Total 180 rice samples were used in this study. 120 random samples of the total 180 ones were established as the calibration set of the MLR and PLSR model. These 120 samples were divided into two parts, 80 of the 120 rice samples were for the back-propagation ANN network training and the other part of the 40 ones were established as the monitoring set which avoided overtraining of the network. To compare with the results of MLR, PLSR, and ANN models, 60 remainder samples of the total 180 ones were established as the validation set. Figure 1 shows the statistical information about the rice samples, in which the protein contents ranged from 6.18% to 9.43% (dry basis, d.b.), and the standard deviation of 180 rice samples ranged from 0.58% to 0.61%.

Chemical analysis
The rice protein content was determined by the Kjeldahl method [33] , and the value of 5.95 was used as a protein conversion factor.

Spectral acquisition
Rice samples were scanned in a standard cup by an NIR spectrometer (model 6500, FOSS NIRSystems, Silver Spring, MD) with a wavelength interval of 2 nm from 400 nm to 2500 nm. To ensure the consistency of sample, each sample was poured out then refilled into the cup after scanning, and three technical replicates were performed for each sample. Results were presented by the mean of spectra value for following analysis.  Figure 2 shows the schematic diagram of the NIR imaging system device. The mainly structures of the system were the imaging camera apparatus, the automatically exchange filters device step motor and its controller. The resolution of NIR CCD camera (Silicon type, C3077-78, Hamamatsu, Tokyo, Japan) was 780 (H) × 488 (V), which was coupled to a camera controller (C2741, Hamamatsu). The camera sensitivity ranged from a visible wavelength region to 1050 nm. The camera lens (FV5025, Mutron) had a focal distance of 50 mm and a diaphragm of 2.5. The image captured by the camera was digitalized by a frame grabber board (Meteor, Matrox Inc., Canada) into an image with 640×480 pixels image then sent to the computer. The light source consisted of four halogen bulbs (50W, 12V, OSRAM) that offered light intensity of 200 W. Moreover, the light source used a voltage controller to make sure of a constant light supplement. Figure 2 Configuration of a NIR imaging apparatus

Imaging data processing and model establishing
One of the significant features for the developed NIR imaging system was the automatically exchange filters. The adaptor held the filters tray and the camera lens. The selected filter was being fixed into the filter tray independently, and then been driven by a gear wheel which was installed in the center of the tray on an immobile axle. The filter was exchanged by a gear wheel which was rotated by a stepper motor.
In this study, the prediction efficiency of the developed NIR imaging system was compared with that of the commercial NIRS. Fifteen band-pass filters (Andover Corporation, USA), where their central wavelengths include 870-1000 nm (step 10 nm) and 1014 nm, were used to extract spectroscopic images of rice samples. The rice protein inspection software which included filters tray position controlling, images capturing, and data processing was developed in the Windows environment using Borland C compiler. Figure 3 shows the image processing flowchart of the NIR imaging system. At the beginning, rice samples were levelly spread in a petri dish (5 cm in diameter and 1.5 cm in depth). the petri dish with a reference (WS-1, Ocean Optics) were then placed together below the camera. The CCD camera captured totally 15 different wavelength images with 640×480 pixels for every rice sample. In order to strengthen the prediction accuracy, only the 120×120 pixels discerned in the middle part of each original image were used to calculate.
The image processing calculates absorbance (A λ ) using the following equation: where, I is the sample intensity at wavelength λ; I R is the reference intensity at wavelength λ; I D is a dark spectrum by CCD camera when the diaphragm of CCD camera was closed and the light source was turned off. Two linear calibration models, MLR and PLSR, and the non-linear model of ANN were established to evaluate the prediction efficiency of the developed system. The MLR model utilized the spectral absorption for one to several wavelengths to predict the rice protein content. The PLSR model calibrated the spectral absorption for all of wavelengths, and then extracted the principal component in the feature spectra. The ANN model is a non-linear method and could describe the relationship between the spectra and the rice protein contents. The selected significant wavelengths from the MLR analysis were used as inputs to the ANN network and the rice protein contents were used as output. The ANN network had one hidden layer, and the hidden layer neurons number was determined according to the methods presented by Liu et al. [34] and Dou et al. [35] . The optimization for ANN network can be determined according to equations (2) where, e a is the error of the approximation; e t and e m are the mean square errors of training set and monitoring set; n t and n m are the sample numbers of training set and monitoring set; n is the total number of samples, and a a c D e  where, D a is the degree of approximation and c is a constant which was utilized to adjust D a to get a good chart.

MLR model for rice protein estimations
The prediction results employed MLR model utilizing the NIRS and the NIR imaging system with the wavelength range from 870 to 1,014 nm are shown in Figure 4. In both two methods, the values of r cal 2 increase as the selected wavelength numbers added, and five wavelengths of the validation set were selected when the minimum SEP values appear. Calibration equations with smaller SEP values indicates better prediction results [36] , which means prediction accuracy of the model MLR model utilized 5 selected wavelengths to estimate the rice protein content displayed the best prediction accuracy of the model ( The selected wavelengths of 880, nm 910 nm, 920 nm, 1014 nm, and 990 nm of the NIR imaging system approached the strong absorption peak of pure protein that Williams and Norris [37] had shown. The developed NIR imaging system displayed a high predictive capacity on rice protein with r val 2 = 0.769, SEP = 0.294%, and Bias = 0.073, which are closed to the prediction efficiency of NIRS.   Figure 5 displays results of the NIRS and the NIR imaging system with wavelength ranged from 870 to 1014 nm employing the PLSR model. The explained variances of the two devices both increased with the added factor numbers, but the increasing slops slow down as the number of factors exceeded the thresholds. Moreover, similar trends in the variations of the root mean square errors of cross validation (RMSEV) in both two detective systems are obtained. According to the suggestion from the Unscrambler software, the numbers of factors utilizing the PLSR model were 7 for the NIRS and 8 for the NIR imaging system, respectively. a b Figure 5 Effects of the factor number on the (a) explained variance (%), and (b) root mean square error of cross validation (RMSECV) in the calibration using PLSR model Because the PLSR model calculated every spectrum among the selected wavelength ranges, effects on different wavelength range input would be more apparent. Therefore, effectively capture the wavelength range containing important information would obviously improve the system efficiency and maintain the prediction accuracy. The detective results of the NIRS utilizing the PLSR model within different wavelength ranges are shown in Table 2 Obviously, the prediction accuracy of MLR model using a smaller wavelength range with the maximum wavelength of 1,014 nm is inferior to the PLSR model. The validation results of the NIR imaging system employing the PLSR system with wavelength range from 870 to 1014 nm are r val 2 =0.782, SEP=0.274%, and

PLSR model for rice protein estimations
Bias=0.030, respectively. Those predictive results show that NIRS possess a high accuracy predicting capability on the rice protein content. While the predicting capability is lower than the NIRS, the potential of the NIR imaging system on non-detective application for rice protein should be prospective. Note: a rcal 2 , SEC, rval 2 , SEP, and RPD indicate coefficient of determination of calibration, standard error of calibration, coefficient of determination of validation, standard error of prediction, and relative performance determination (RPD = SD, standard deviation /SEP), respectively.

ANN model for rice protein estimations
In this study, networks evaluation included single input layer, hidden layer, and output layer, respectively. In back-propagation networks, the number of hidden neurons affected efficiency of a learned dataset. For example, too many hidden neurons might cause program to memorize, which would result in fail generalization of the input/output relationship. Therefore, it was necessary to optimize the neurons numbers in the hidden layers. In this work, the 5 significant wavelengths selected from the MLR model and the rice protein contents were respectively supplied for the input and output of the ANN network. Figure 6 illustrates the effects of the hidden neurons numbers on the error indices. Because the irregular trends of e t and e m , both e t and e m cannot optimize the number of hidden nodes. Besides, the maximum D a approaches when three hidden neurons were used. It means the optimal hidden neurons number was three by criterion of the maximum D a . Meanwhile, the optimal ANN model of NIRS was also established by the same approach.
The ANN model established by NIRS shows the coefficient of determination of the training and monitoring sets of 0.857 and 0.841, respectively; the prediction result of the r val 2 was 0.824 (    Figure 7 The predicted protein contents with MLR, PLSR, and ANN models vs. their reference values (n = 60)

Conclusions
The developed imaging system consisted of a NIR camera, filters, an automatically exchange filters device, and the imaging processing techniques. This study utilized the imaging method to measure the spectrum absorption employed the MLR, PLSR, and ANN analysis models to detect the rice protein content. The measurements of the NIRS were used to establish the calibration model of the system, and the predictive results of the NIRS and the system were compared. In the MLR model, the NIR imaging system used the calibration equation that was consisted of 5 wavelengths (880 nm, 910 nm, 920 nm, 1000 nm, and 1014 nm) to predict the rice protein content, and had r val 2 and SEP results of 0.782 and 0.274%, respectively. The NIR imaging system used 15 filters ranging from 870 to 1014 nm in the PLSR model, the predictive results expressed a better performance (r val 2 = 0.782, and SEP = 0.274%) than that of the MLR model. However, the required spectrum amount of PLSR model was much more than that of the MLR. The ANN model, the net input using the 5 spectrum wavelengths selected by the MLR, simplified the model, and the predicting results (r val 2 = 0.806, and SEP = 0.266%) were similar to those of the PLSR. Therefore, utilizing the NIR imaging system for the predicting assessment on rice protein content, the ANN model is recommended for this approach. Those predictive results show that NIRS possess a high accurate predicting capability on the rice protein content. Moreover, with advantages of simple, convenient operation and high detection accuracy, the developed NIR imaging system shows commercial potential in non-destructive detection of rice protein content. [References]