Cost-effective method for degradability identification of MSW using convolutional neural network for on-site composting

Automatically identifying the degradability of municipal solid waste (MSW) is one of the key prerequisites for on-site composting to prevent contaminations from undegradable wastes. In this study, a cost-effective method was proposed for the degradability identification of MSW. Firstly, the trainable images in the datasets were increased by performing four different sizes of cropping operations on the original images captured on-site. Secondly, a lite convolutional neural network (CNN) model was built with only 3.37 million parameters, and then a total of eight models were trained on these datasets with and without the image augmentation operations, respectively. Finally, a degradability identification system was built for on-site composting, where the images were cut to different sizes of small squares for prediction, and the experiments were conducted to find the best combinations of the trained models and the cutting size. The results showed that the validation accuracies of the models trained with the augmentation operations were 0.91-2.07 percentage points higher, and in the evaluation of the degradability identification system the best result was achieved by the combination of W8A dataset and cutting size of 1/14 reached an accuracy of 91.58%, which indicated the capability of this cost-effective method to identify the degradability of MSW.


Introduction 
The generation of municipal solid waste (MSW) is growing fast with the increasing human population and urbanization around the world [1] . Composting of organic fraction of municipal solid waste (OFMSW), which accounts for more than half of the total amount of MSW [2] , is a cost-effective and environmentally sustainable way to reduce pressure on the environment by converting degradable matters to value-added products such as nutrient-rich fertilizer, biopesticide or for bioremediation usage [3][4][5][6] . However, the application of conventional composting is limited in China because it was severely affected by the contamination from the mixture of undegradable wastes that might be containing toxic substances, leading to an overdose of heavy metals within the products [7,8] . On-site waste separation and composting is a reliable way to cope with this situation [9,10] . For the improvement of the reutilization of resources within MSW, China launched a nationwide classification system in 2019 [11] .
Yet changing people's habits is not easy, which is a long way to go and will inevitably lead to high management costs. Hence, it is of great significance and practical value to classify degradable and undegradable MSW by automation techniques, where identifying the degradability of MSW is an important prerequisite.
Since the composition of MSW is complicated, which consist of degradable parts such as vegetable scraps, discarded fruits, food residues and undegradable part like plastic films, plastic bottle, metal container, it is a challenge to recognize the substances within it especially those stacked wastes. In recent years, there are some studies in the field of waste classification. Xiao et al. [12] developed an image classification system based on hyperspectral image analysis, realized identification of five kinds of common construction waste namely foam, plastic, brick, cement, and wood. Spectral angle mapping (SAM) and fisher discriminant analysis were implemented by Zhao et al. [13] to classify three kinds of common waste, namely paper, plastic, and wood wastes. Vrancken et al. [14] studied the combinations of illumination angles and the number of cameras used to obtain images, and different image augmentation strategies were used to train convolutional neural networks for classification of paper and cardboard with accuracy reached 77.5%.
White et al. [15] established an identification model based on a convolutional neural network for the recognition of common solid wastes such as paper, cardboard, glass, metal, and plastic. Rabano et al. [16] developed a garbage classification model that can be deployed on the Android system to sort glass, paper, cardboard, plastic, metal, and other garbage. Kang et al. [17] developed a garbage classification model based on the ResNet-34 algorithm, classification accuracy reached 99%.
Though hyperspectral systems obtained high accuracies due to their capabilities of feature extraction, high cost is an unavoidable factor obstructing it from practical application.
While the methods based on image classification technologies, especially those based on convolutional neural networks are capable to learn a fine feature extractor automatically, reached rather good results with relatively low costs. But in practical application, there are situations that many types of wastes appearing in one image at the same time, which limited the performance of the image classification system. For this reason, object detection models were introduced to this field, such as Faster regional convolutional neural network (R-CNN) was implemented to recognize the recyclable and hazardous waste [18] and You Only Look Once (YOLO) was applied to recognize plastic waste [19] . However, due to the occlusions, the large variety of shapes as well as the enormous categories in MSW, it is hard to label such a huge amount of data and also not easy to train models with it. Therefore, there is a great need for research on cost-effective methods to identify the degradability of MSW. This study developed a cost-effective degradability identification system for on-site composting of OFMSW, where an image cropping process was applied to increase amounts of trainable data and the images captured by a USB camera were divided into small local areas for prediction with a trained lite CNN model and discussed the drawbacks and future works.

Dataset and augmentation
Samples of MSW were collected from garbage bins in residential areas of Changsha, China, and were grouped into two main classes, degradable and undegradable. Degradable wastes are materials containing rich nutrients needed for the biological activities of microorganisms, and suitable for composting such as vegetable leaves, fruits, weeds, and food residues. While undegradable wastes are those not suitable such as plastic films, printed packing bags, beverage bottles, discarded papers, cardboard, and woods.
Raw image data were captured by a smartphone which is HTC U Ultra. The smartphone was held at a vertical angle, as shown in Figure 1, as well as a certain height making the imaging area cover all waste in the bin. The images were collected twice in the morning and afternoon on a sunny day, where direct sunlight was avoided for stable light intensity. Captured images have a resolution of 4096×3072 pixels representing about 40 cm×30 cm field of view. A total of 144 raw images were captured for the undegradable samples and the degradable samples, in which each class contained 72 images. Then, the original images were adjusted to the central region image that was covered by wastes, representing about 32 cm×3 2 cm field of view. Further, these images were cropped into small squares with side lengths of 1/2, 1/4, 1/6, 1/8 of the side length of the central region images. Therefore, the fields of view of those small images were about 16 cm×16 cm, 8 cm×8 cm, 5.3 cm×5.3 cm, 4 cm×4 cm, respectively. After removing images that have obvious ambiguity, four original datasets, namely W2, W4, W6, and W8, were built up, where the numbers of trainable images were 576, 2296, 5156, 9088, respectively.
Training a convolutional neural network requires a large amount of labeled data to avoid overfitting [20,21] . For this reason, image augmentation techniques were applied due to the small amount of data. The image augmentation operations included a rotation at a random angle of ±45°, translation along with the width and height direction with random distances of 0.2 times the side length and horizontal random flip. The void areas in images caused by these operations were filled by the mirror mode. The images were augmented 10 times by the above operations. Hence, the amount of trainable data in the augmented datasets, namely W2A, W4A, W6A, and W8A were 10 times those in the original datasets. Figure 1 Cost-effective approach to increase the number of trainable images

Lite CNN model
In comparison to deep CNN models such as ResNet [22] or VGG-16 [23] , simple CNN models such as AlexNet [24] are cost-effective although there are certain decreases in performances. Therefore, a lite CNN model that is easy to train and implement was built based on the structure of AlexNet, where the input images were resized to 48×48×3, and the depth of the network was compressed to seven weighted layers with only 3.37 million training parameters. As shown in Table 1, a convolution layer with a kernel that has a size of 1×1 was deployed after the input layer, which could gain more ability of non-linear representation and make the network deeper with relative low costs [25] . Convolution and max-pooling were then performed four times in sequence with kernel sizes of 5×5, 3×3, 3×3, 5×5. Three full connection layers were connected behind, with a dropout layer between each of the two adjacent layers to limit overfitting [26] . The softmax activation function was applied in the last full connection layer. The outputs of the model consisted of 2 values ranging from 0 to 1 which were the predictions of the probabilities that the waste in the image is degradable or undegradable, respectively. The computer hardware used for model training included an Intel i9-10920 CPU and an RTX TITAN graphic card. The training was implemented with Tensorflow [27] , which is a platform for machine learning. The hyper-parameters were set as follows, the optimizer was stochastic gradient descent (SGD), the loss function was categorical cross-entropy, the learning rate was 0.001, decay of the learning rate was 0.00001, momentum was 0.9, batch size was 32, drop out coefficient was 0.5, the number of epochs was 500. Where the SGD [28] is an algorithm to minimize the objective equation J(θ), where, the loss function L is calculated based on each training sample x (i) and label y (i) in the mini-batches that have m′ samples. Then, the parameter θ was updated by the following equation [28] , (2) where, ∇ θ is the gradient operator and the learning rate η determines the size of the step used in the process of reaching the minimum.
Besides, to obtain better training performances, the input data were centralized and normalized by the following equation, where, μ and σ are the mean value and standard deviation of all pixels in a batch of data, respectively. Datasets were randomly divided into the training set and validation set with a ratio of 8:2. A total of 8 models were trained with 4 augmented and 4 original datasets, respectively. Then, the trained models of the last 5 training epochs were evaluated on the validation sets and the mean validation accuracies were recorded, and the standard deviations of these results were also calculated.

Degradability identification system
Trained models were applied on an on-site automatic composting device to build a degradability identification system, in which the degradability of the feed wastes was identified to avoid undegradable waste entering into the composting reactor. The identification system consisted of mechanical parts and control parts.
The mechanical part of the system included an electric-powered door and a holding plate with which the feed wastes can temporarily stay for imaging. The actions of staying or entering were performed by controlling the linear actuator. While the controlling part of the system consisted of a USB camera, raspberry Pi 4 micro-computer and a monitor. As shown in Figure 2, the implementation steps were to feed wastes to the entrance as the door opened, capture raw images via the USB camera, crop the raw images by a region of interest (ROI), cut the ROI images into small square images, predict the degradability of each small image by trained model, finally show the results on the monitor guiding separation of undegradable wastes and let the wastes enter into the composting reactor when the degradability of the waste at the entrance met the required threshold.
The raw images captured by the USB camera (WX151HD, produced by Shenzhen WEIXINSHIJIE Technology Co., Ltd., China) have a resolution of 1280×720 and a minimum visual angle of 50°. The camera was fixed on the inside of the door and its imaging area completely covered the entrance of the composting device. The ROI was fixed to the area of the holding plate which is an area of about 32 cm×32 cm. A light-emitting diode (LED) light source, 300 lx with a temperature of 4000 K, was applied for illumination, which was mounted at the same height and angle as the camera.
The performance of the degradability identification system was evaluated with real waste samples, where the relative accuracy P was defined as the following equation: where, a is the number of small images correctly recognized; t is the total number of small images generated by the cutting operation.
Note: The input wastes were temporally held at the entrance of the composting machine with a mechanical system, then the image was captured by a USB camera, and a fixed ROI was applied to obtain the region of the holding plate in the image, afterward the ROI image was cut to small squares and each of it was classified using trained lite CNN model, finally after waste separation guided by the results of degradability identification of these small images which were displayed on the screen the degradable waste entered the reactor chamber for composting. Figure 2 Degradability identification system for on-site composting 3 Results

Evaluation of the lite CNN models
As shown in Table 2, the validation accuracies of the models trained with original datasets were negatively correlated to the cropping sizes, such as the model trained with W8 dataset which has the minimum cropping size achieved the highest validation accuracy of 97.57%, which was 0.85 percentage points higher than the results obtained from the model trained with W2 dataset. It is obvious that the performances of the models trained with augmented datasets were better than that with original datasets, with the accuracies increased 2.07, 1.08, 1.61, 0.91 percentage points respectively. In comparison, no significant correlation between results and cropping sizes was found in those models trained with augmentation.

Performance of identification system
In order to evaluate the performance of the degradability identification system and find out the best cutting size for prediction, experiments were conducted, where the input ROI images were cut into squares with side lengths of 1/2, 1/4, 1/6, 1/8, 1/10, 1/12, 1/14, 1/16 of the side length. Afterwards, these small images were predicted by 4 models trained with augmented datasets. Eight samples were collected for evaluation, which consisted of vegetable scraps and melon peels as degradable waste and plastic films, plastic bottles, cardboard, and discarded paper as undegradable waste. Each evaluation was performed on a mixed sample, which was a combination of at least two kinds of wastes including degradable and undegradable categories at the same time. The evaluation results were mean values of three repeating for each sample. As shown in Table 3, except the accuracies of the models trained with W2A and W4A datasets exceeded 80%, the accuracies were relatively low when predicting images with large cutting sizes, such as the accuracies ranged in 62.50%-78.10% when the cutting sizes were 1/2 and 1/4 of the side lengths. Besides, it was found that the accuracies were higher when the cutting size was smaller. The model trained with W8A dataset reached the highest accuracy of 91.58% when its cutting size was 1/14. The model trained with W2A dataset achieved an accuracy of 90.95% at the same cutting size. While the accuracies of the remaining models were lower than 90%.
These results showed that in the degradability identification system, good performances can be achieved when the training dataset and image cutting size were properly selected.

Discussion
This study developed a cost-effective degradability identification system for on-site composting of OFMSW. The datasets were increased by cropping the original images. Subsequently, the CNN models were built and trained with these datasets and showed that the augmentation operations were beneficial to the results with validation accuracies increased 0.91-2.07 percentage points.
When it was applied to the degradability identification system, the best combinations of the datasets and the cutting sizes for prediction were W8A dataset and a cutting size of 1/14, which reached the best accuracy of 91.58%.
Nevertheless, there are still some shortcomings that need to be further investigated for practical applications.
Firstly, the degradability identification was based on the classification of the small images which only cover local regions, where some of global features were ignored. To improve the accuracy, further study should try to keep more features by combining the outputs of predictions on the images with different cutting sizes. On the other hand, the forward calculations of the CNN that were performed for each small image resulted in high computations. To make the identification system more efficient, the structure of the CNN model should be modified to share the computation of its feature extraction.

Conclusions
In this study, a cost-effective degradability identification system was built for on-site composting of OFMSW, which consisted of an effective way to increase the amount of trainable image data, a lite CNN model as well as a strategy for prediction. The number of trainable images was increased from 144 to a maximum of 9088 using the cropping method. The accuracies of the trained models were enhanced by applying image augmentation operations on the datasets. Further, experiments were conducted to find out the best combinations of parameters of the method. The results showed that this cost-effective method was capable to identify the degradability of MSW samples with an accuracy of 91.58%.