Classification of different walnut varieties using low-field nuclear magnetic resonance technology and cluster analysis

To classify different walnut varieties based on water and oil content of walnut, and determine their storage conditions, the low-field nuclear magnetic resonance (LFNMR) technology was used to obtain the NMR transverse relaxation time (T2) of the samples based on the physical and chemical indicators of the walnut quality. The relationship between the relaxation time and phase state of the internal material of the sample was investigated, and the characteristic parameters of the NMR spectrum signals were statistically analyzed using cluster analysis to determine the different walnut varieties, and three different components, as well as their contents, were detected by a LFNMR spectrometer: firmly bound water, weakly bound water, and weakly bound oil. Test results indicated that the oil peak was dominant in the overall signal intensity compared to the water peaks, in which the firmly bound water phase contributed more to the overall water signal between the water peaks. Using the analytic hierarchy process of cluster analysis, 21 walnut samples were classified into three different classes, based on the characteristic parameters of the water-content and oil-content spectrum signals. The first class contains four walnut varieties characterized by least water and highest oil contents; the third class contains two walnut varieties, with the highest water content and least oil content; whereas, the second class contains 15 walnut varieties, with both water and oil contents at medium levels. The results showed that LFNMR led to a rapid detection of moisture and oil contents in walnuts, while cluster analysis classified different walnuts varieties based on these parameters. This study also provided the basis for optimizing the storage methods and storage conditions of walnuts.

1 Introduction  Juglans rejia L., also known as walnut or Qiang peach, is nutritious and has high economic value [1] .Zhang Qian of Han Dynasty introduced walnut from the western regions into China.The plant has a long history of cultivation and breeding.Considering the Liaoning variety as an example, the Liaoning Economic Forestry Research Institute started breeding walnuts in the 1970s, and since, has selectively bred a wide range of other superior walnut varieties including the Liaoning Series (Liaoning No.1, Liaoning No.4, Liaoning No.5, etc.), which is an early-fruiting walnut variety, and the Lipin Series (Lipin No.1, Lipin No.2, etc.), which is a late-fruiting walnut variety.Currently, walnut classification is based on multiple factors, such as its phenotypic traits, and the sensory and nutritional qualities, which are then used as the reference standard for germplasm evaluation and selective breeding [2][3][4][5][6][7] .
The storage characteristics of walnuts are closely related to its moisture content.Excessive or very low moisture content accelerates the process of oxidation or rancidity reaction and affects the taste of the product.The activity of walnut lipase and acid value of oil is affected by different storage conditions and temperature.Hence, different storage conditions should be adopted for walnuts with different moisture content; however, a rapid method for detecting the moisture and oil contents of walnuts (and thereby facilitating their classification based on these characteristics) is absent.
Low-field nuclear magnetic resonance (LFNMR) is a technology with a constant magnetic field strength less than 0.5 T. As it is fast, accurate, non-toxic, and harmless, it has been applied for the determination of in-vivo moisture changes, oil composition, quality, and shelf life of fruits [8][9][10][11][12][13] .In recent years, LFNMR technology has been extensively used, particularly for the classification, identification, and appraisal of samples.Utilizing LFNMR technology to measure the relaxation time of hydrogen and sodium protons in the samples, Greer et al. [14] reliably classified cooking oil, milk, and soy sauce.Zhang et al. [15] adopted the principal component analysis (PCA) method for the rapid and effective classification of different brands of spiced beef by measuring the relaxation characteristics of three types of spiced beef by LFNMR.Using LFNMR technology together with pattern recognition methods, such as PCA and partial least squares discriminant analysis, Xia et al. [16] identified five batches of snack bean curd brands and obtained a predication accuracy of 100%.However, the classification of different walnut varieties based on their water and oil contents using this technology, is as yet unreported.
To address this gap in knowledge, 21 different varieties of walnuts were selected as test samples in this study, and their transverse relaxation time T 2 , was detected using LFNMR technology.Further, the relationship between the relaxation time and the internal phase state of the sample was analyzed, and the classification of the different varieties was studied quantitatively through the cluster analysis of NMR spectrum signals.This study focuses on the development of a new method which can rapidly and accurately determine the moisture and oil contents of walnut.The study also aims to explore the applicability of LFNMR technology in the classification and determination of different walnut varieties, the results also provide reference of regarding the storage methods and storage conditions of walnuts.

Test equipment
The test equipment included the following: an NMR spectrometer, NMI20-015V-I (Shanghai Niumai Electronic Technology Co. Ltd.), with a magnetic field strength of (0.5±0.08) T, an RF pulse frequency of 21 MHz, a magnet temperature of 32ºC, and a probe coil diameter of 15 mm; a Sartorius BAS124S-CW ten-thousandth electronic balance (Beijing Sartorius Instrument System Co., Ltd.) with maximum and minimum weighing values of 120 g and 0.01 g, respectively, and a measurement accuracy of 0.0001 g, internally calibrated; and a glass test tube of diameter 12 mm.

Test materials
The test samples were harvested in the Regional Experimental Park of the Liaoning Economic Forestry Research Institute located in Songmudao Village, Paotai Town, Junpu New District, Dalian, Liaoning (121º45ʹE, 39º24ʹN), in October 2017.A total of 21 walnut varieties were harvested (specific varieties are shown in Table 1).The harvesting criterion was that half the exocarps of the walnuts were split.After the samples were harvested and their exocarps were removed, they were cleaned and dried at 40ºC.The samples were then classified according to their variety, placed separately in Ziploc bags, and stored at a temperature of 5ºC until the test.In this study, all the test samples were grown in the same environment, sampled at the same time, and stored under the same conditions, which are necessary to successfully complete the test.

Test sample preparation
Thirty walnuts were randomly selected from each walnut variety.After extraction, the seed was rapidly cut such that its diameter was approximately 12 mm, to be used as the test sample.The mass of the sample was then measured using an electronic balance and recorded; the masses of all the samples were within 0.4113-0.5945g.The masses of some of the samples are listed in Table 1.
Table 1 Test sample varieties and masses (g) The standard oil sample was placed vertically at the center of the LFNMR spectrometer in a glass test tube.Further, the pulse sequence of the free induction decay (FID) in the NMR spectrum analysis software was used to determine the center frequency of the magnetic field and the pulse width of the hard pulse.It determined the parameters SF, O1, P1, and P2.The standard oil sample was then removed, and the prepared test sample was placed at the bottom of the glass test tube, which was positioned vertically at the center of the LFNMR spectrometer.The Carr-Purcell-Meiboom-Gill sequence (CPMG) pulse sequence in the NMR spectrum analysis software was used to determine the transverse relaxation time, T 2 , of the sample; this was repeated thrice for each sample.The main parameter settings of the CPMG pulse sequence are as follows: SF = 21 MHz, O1 = 651269.85Hz, TD = 449990, P1 = 18 µs, P2 = 35.52µs, TW = 1200 ms, TE = 0.25 ms, NECH = 18000, and NS = 16.

Transverse relaxation time T 2 inversion
The iterative optimization method was used to substitute the T 2 attenuation curve, acquired by three repeated measurements, into the relaxation model for fitting and inversion to obtain the transverse relaxation time, T 2 , of the sample and its corresponding relaxation signal component.The average values of the above parameters obtained from the inversion were considered as the relaxation time and signal amplitude of the sample.For this study, the signal amplitude was normalized for a mass of 0.5 g to facilitate quantitative analysis.

Data processing
All the test data were processed using SPSS23.0 to obtain the maximum, minimum, average, standard deviation, etc. of each characteristic parameter of the nuclear magnetic signal.The data were standardized prior to cluster analysis.Furthermore, after adopting the squared Euclidean distance as the genetic distance, the analytic hierarchy process was used for cluster analysis.
The specific operation of cluster analysis were to use the "system clustering" menu of SPSS23.0 software,the "Peak area 1," "Peak area 2," "Peak area 3," and "Total area" were regarded as variables, and the "species name" as a case, to carry out clustering and pedigree drawing.

Results and discussion
3.1 Relationship between the transverse relaxation time, T 2 , of the NMR and the phase state of the substances in the sample After measuring 21 walnut varieties by using the NMR spectrum analysis software, the original data file suffixed with .peawas obtained.
Information on the transverse relaxation attenuation curves of the samples was stored in the file, i.e. the transverse relaxation time of hydrogen protons and the corresponding signal amplitude of hydrogen protons in samples stimulated by a 180° reversed-phase pulse sequence of CPMG pulses.To visually demonstrate that different walnuts have different signal amplitudes at different times, the above .peafiles are represented graphically in Figure 1.
The transverse relaxation decay curves of 21 types of walnuts were consistent.If all the 21 curves are drawn, it will overlap and become unrecognizable.So using the analytical hierarchy process of cluster analysis, the 21 samples were classified into three categories according to their water and oil content LFNMR spectra (it will be discussed in further detail in future reports).Therefore, each class selects a sample to draw in Figure 1.The Figure 2 is the same principle with Figure 1.
The transverse relaxation attenuation curves vary among different walnuts, i.e., the maximum signal and the time required for signal attenuation to cease are different.Here, the maximum signal represents the total number of hydrogen protons in the sample, and the signal cut-off time represents the time required for the signal to completely decay, i.e., the time required for the transverse magnetization vector to decay to zero (spin-spin relaxation time T 2 ).All the original data files suffixed with PEA were input into the NMR spectral analysis software.The multi-component inversion was performed with an iteration number of 100 000, and the transverse relaxation time T 2 inversion spectrum of the sample was obtained, as shown in Figure 2. The horizontal axis of Figure 2 includes the 200 transverse relaxation time components, T 2 , that are logarithmically distributed between 10 -2 ms and 10 4 ms, and the ordinate indicates that the signal amplitude, A 2i , corresponding to each relaxation time.It has been previously established that the signal amplitude is proportional to the content of the corresponding component, and that the integral area, A, is the total signal amplitude of the sample.Walnut seed kernels contain varied content, the majority of which are macromolecular substances such as lipids, proteins, and polysaccharides.A study has revealed that each 100 g of walnut kernel contains 66.90±0.25 g of lipid, 16.66±0.51g of protein, and 19.60±0.50g of carbohydrate, with the remainder being the ash and water [17,18] .
Base on the NMR principle, its transverse relaxation time, T 2 reflects the chemical environment, where the protons in the sample are located; this is correlated to the binding force and degree of freedom of the hydrogen protons.Therefore, the degree of binding of the hydrogen protons is closely related to the internal structure of the sample [19,20] .The greater the binding force of the hydrogen protons, i.e., the smaller the degree of freedom, the shorter is the transverse relaxation time, T 2 , and the peak is located toward the left of the T 2 spectrum.In contrast, the smaller the binding force, the longer is the transverse relaxation time, T 2 , and the peak is located toward the right of the T 2 spectrum.Therefore, the NMR transverse relaxation time, T 2 , can be used to identify different substances or the different components of a substance [21,22] .
The total of the corresponding areas of all the peaks in the T 2 inversion spectrum constitutes the total amplitude A of the NMR T 2 inversion spectrum, which is proportional to the number of hydrogen atoms in the sample [23,24] .By observing the T 2 inversion spectra of all the samples, it is found that all the spectra have three peaks.Based on the NMR principle and the characteristic analysis of the samples, it is determined that the first two peaks, T 21 and T 22 , whose relaxation times are less than 10 ms, are water peaks, and according to the length of the relaxation time, they correspond to firmly bound water and weakly bound water, respectively.The third peak, T 23 , whose relaxation time is longer than 30 ms, is an oil peak.Here, we defined the signal amplitude, A 21 , of the transverse relaxation time, T 2 , in the range of T 21 (0.011-1.5 ms), as firmly bound water, with an average peak apex time of 0.3719 ms; the signal amplitude, A 22 , of the transverse relaxation time, T 2 , in the range of T 22 (1.5 to 17 ms) was defined as weakly bound water, with an average peak apex time of 5.0037 ms, and the signal amplitude, A 23 , of the transverse relaxation time, T 2 , in the range of T 23 (28-1080 ms) was defined as oil, with an average peak apex time of 132.2674 ms.
The observations of the T 2 inversion spectra of all the samples reveal that among the water peaks, the firmly bound water phase contributes more to the overall water signal; however, compared to the water peaks, the oil peak dominates the overall signal intensity.
By investigating the transverse relaxation time inversion spectra of several random walnut varieties, we determined that there are significant differences between the activity of macromolecules such as lipids and that of water, i.e., the macromolecular-substance content is higher, and macromolecular activity is more stable.
The polarity of macromolecular substances also has a significant effect on the relaxation time.

Cluster analysis of the NMR-spectrum-signal characteristic parameters
Chemometrics, a powerful tool for data analysis, has been successfully applied in multiple fields, among which cluster analysis is the most commonly used analytical method [25][26][27][28][29][30] .Chemometrics selects the best measurement method to effectively obtain the most useful characterization data in a system, and it maximally extracts the qualitative and quantitative information on relevant substances from the samples by analyzing the measurement data.In this work, clustering analysis of different walnut varieties was performed according to the NMR signal amplitudes of strong-binding water, weak-binding water, and oil peak collected.
Figure 3 shows the merging of each cluster during the analysis in the form of a tree diagram.The distance between classes is automatically mapped in the range of 0-25.The average cluster (inter-class) tree diagram is then used for re-adjusting the distance for cluster merging, whose approximate process is displayed in Figure 3, from which the relationships among the 21 walnut varieties in the test can be observed.From Figure 3, according to the "clustering number" of the constant axes, the following can be concluded.When the squared Euclidean distance is six, the 21 walnut varieties can be divided into seven classes; when it is nine, the walnut varieties can be divided into four classes, and when it is 20, the walnut varieties can be divided into three classes.The classification of the 21 walnut samples through cluster analysis indicates the compositions of these samples to a certain extent.The results of walnut clustering reflect that the water and oil content of different walnut varieties are differ, and the water and oil content of related species are similar.In cluster analysis, the determination of the class number is key to clustering because the analytic hierarchy process outputs all possible clustering solutions.
Considering that the distance between the centers of each class pair should be relatively large and the number of samples included in each class should not be too large, a cluster analysis scatter chart was plotted using the coefficients of the merging state table, according to the purpose of the analysis, as shown in Figure 5.The gravel maps determine the final number of clusters.The horizontal axis in the gravel diagram indicates the distance between classes (obtained from the condensed state table), and the vertical axis indicates the clustering number, respectively.From the cluster analysis scatter chart of Figure 5, it can be observed that as the classes merge, the number of classes continue to decrease, and the distances between classes gradually increase.Before the classification of the samples into seven classes, the increase in distances among the classes is relatively small, forming the "steep hill" in Figure 5.However, after the samples are classified into three classes, the distances among the classes rapidly increase, forming a "flat gravel road".As small distances between the formed classes indicate more similarity and vice versa, the "turning point" at the "foot of the hill" can be used as a reference to determine the number of classes.Therefore, it was decided that it is appropriate to cluster the samples into three classes in this study.The first class contains four walnut varieties: BZCDK, 1-1-4, 4-4-13, and Li2; the second class contains 15 walnut varieties: BZZGQ, Liao1, Liao4, Liao5, Liao6, Liao7, Liao10, Liaoruifeng, Li1, Z1-44, 2-3-22, 3-3-21, 4-2-26, 4-2-37, and 19-29; the third contains two walnut varieties: Hanfeng and 10901.
Therefore, the 21 walnut varieties were divided into three categories, including the number of walnut germplasms 4, 15, and 2. The data were divided into three groups by using the SPSS software, and the three groups of data were statistically described.Descriptive statistics were conducted on the peak area and total area of three peaks and the four parameters as the indexes of the NMR spectrum signals of 21 walnut varieties.The statistical data of various indexes are summarized in Table 2.
The analysis of the data in Table 2 shows that the first class contains four walnut varieties with the least water content and highest oil content, the third class contains two walnut varieties with the highest water content and least oil content, and the second class contains 15 walnut varieties with water and oil contents both at medium levels.In this study, the number of samples in the second class is also the largest.Cluster analysis is a commonly used data analysis method, which has been adopted by several previous studies to analyze, classify, and mine data from fruit-component, fruit-tree-breeding, and genetic research studies.In this study, using the analytic hierarchy process of cluster analysis, the characteristic parameters of the NMR spectrum signals of 21 walnut varieties were systematically classified for investigating the genetic relationships among these varieties.The results suggest that the 21 samples can be classified into three different categories according to their water-and oil-content spectrum signals: the first class has the least water content and highest oil content, which includes the BZCDK, 1-1-4, 4-4-13, and Li2 varieties; the third class has the highest water content and least oil content, which includes the Hanfeng and 10901 varieties; and the second class with medium water and oil contents includes the BZZGQ, Liao1, Liao4, Liao5, Liao6, Liao7, Liao10, Liaoruifeng, Li1, Z1-44, 2-3-22, 3-3-21, 4-2-26, 4-2-37, and 19-29 varieties.Thus, the 21 walnut samples were successfully classified by cluster analysis into three classes based on moisture and oil contents.

Conclusions
Based on the NMR principle and the characteristic analysis of the samples, we have defined the signal amplitude, A 21 , of the transverse relaxation time T 2 , in the range of T 21 (0.011-1.5 ms) as firmly bound water, with an average peak apex time of 0.3719 ms; the signal amplitude, A 22 , of the transverse relaxation time T 2 , in the range of T 22 (1.5-17 ms) as weakly bound water, with an average peak apex time of 5.0037 ms; and the signal amplitude, A 23 , of the transverse relaxation time T 2 in the range of T 23 (28-1080 ms) as oil, with an average peak apex time of 132.2674 ms.
On examining the peak signal amplitudes of the transverse relaxation time inversion spectra of three random samples, Hanfeng, BAZCDK, and Li1, it was found that the oil peaks were dominant in the overall signal intensity compared to the water peaks, in which the firmly bound water phase contributed more to the overall water signals in the water peaks.
Thus, the 21 walnut samples were successfully classified by cluster analysis into three classes based on their moisture and oil contents.

Figure 1 Figure 2
Figure 1 Transverse relaxation decay curves of three types of walnuts

Figure 3
Figure 3 Cluster analysis tree diagram of the NMR-spectrum-signal characteristic parameters of 21 walnut varieties Figure 4 displays a bar chart of the NMR spectrum signal characteristic.From the last line, it can be seen that the group number representation on the vertical axis is grouped into several categories.With the 21 walnut varieties are grouped in 20 categories, Liao5 and Liao rui are in one group.It is indicates that they have similar water-oil content and can be classified as one type.When the walnuts are grouped into 19 categories, Liao4 and Liao7 are grouped into one category, it is indicates that they are the second most similar.All varieties are classified and processed analogically.

Figure 4
Figure 4 Cluster analysis bar chart of the NMR spectrum signal characteristic parameters of 21 walnut varieties

Figure 5
Figure 5 Cluster analysis scree plot of the NMR-spectrum-signal characteristic parameters of 21 walnut varieties

Table 2
Cluster analysis descriptive statistics of the different characteristic parameters of the NMR spectrum signals of 21 walnut varieties