Association rule mining algorithm based on Spark for pesticide transaction data analyses
Abstract
Keywords: Spark, association rule mining, ICAMA algorithm, big data, pesticide regulation, MapReduce
DOI: 10.25165/j.ijabe.20191205.4881
Citation: Bai X N, Jia J D, Wei Q W, Huang S Q, Du W C, Gao W L. Association rule mining algorithm based on Spark for pesticide transaction data analyses. Int J Agric & Biol Eng, 2019; 12(5): 162–166.
Keywords
Full Text:
PDFReferences
Li D L. Internet of things and wisdom agriculture. Agricultural Engineering, 2012; 2(1): 1–7. (in Chinese)
Zhao C J. Intelligent agriculture prospects, digital technology will have a new future. Marketing (Agricultural Resources and Markets), 2018; 18: 59–61. (in Chinese)
Zhang M, Jin Y H, Zheng F T. Investigating the “One Farm Household, Two Production Systems” in rural China: The case of vegetable and fruit farmers. Annual Meeting of Agricultural and Applied Economics Association (AAEA), Boston, Massachusetts, July 31-August 2, 2016.
Yan M J, Luo J, Liu J Y, Hou C W. IABS: parallel improved Apriori algorithm based on Spark. Application Research of Computers, 2017; 34(8): 2274–2277.
Zhou Z H, Yang Q. Machine learning and its applications. Beijing: Tsinghua University Press, 2011.
Apache Hadoop. http://hadoop.apache.org/.
Lin J X, Huang Z. An improved Apriori algorithm based on array vectors. Computer Applications and Software, 2011; 28(5): 268–271.
Cao Y, Miao Z G, Zhang H X. Application research about degree warning based on improved Apriori algorithm. Computer Development & Applications, 2014; 27(6): 1–3. (in Chinese)
Zhao X J, Sun Z X, Yuan Y, Chen Y. An improved Apriori algorithm based on orthogonal list storage. Journal of Chinese Computer Systems, 2016; 37(10): 2291–2295. (in Chinese)
Mahout. http://mahout.apache.org/.
Distributed computing. http://baike.baidu.com/view/7011548.htm.
Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008; 51(1): 107–113.
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, April 25-27, 2012.
Mllib. http://spark.apache.org/mllib/.
Low Y C, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein J. Graphlab: A new framework for parallel machine learning. https://arxiv.org/ftp/arxiv/papers/1408/1408.2041.pdf.
Berkhin P. A survey of clustering data mining techniques. Grouping multidimensional data. Springer, Berlin Heidelberg, 2006; pp.25–71.
Hu S J. Parallel data mining algorithm research in cloud. Chengdu: University of Electronic Science and Technology of China, 2013. (in Chinese)
Tian S P, Wu W L. Algorithm of automatic gained parameter value k based on dynamic k-means. Computer Engineering and Design, 2011; 32(1): 274–276. (in Chinese)
Suo H G, Wang Y W. Reference-based k-means algorithm for document clustering. Computer Engineering and Design, 2009; 2: 401–403,407. (in Chinese)
He Z, Qian J S. A multicenter clustering algorithm for automatic acquisition of K values. Electronics World, 2012; 4: 60–64. (in Chinese)
Bloodgood M, Ye P, Rodrigues P, Zajic D, Doermann D. A random forest system combination approach for error detection in digital dictionaries. Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data. Association for Computational Linguistics, 2012; pp.78–86
Mahdi M U. Determining number and initial seeds of K-means clustring using GA. Journal of Babylon University/Pure and Applied Sciences, 2010; 18(3): 1–6.
Lu S L, Lin S M. Distance-based outliers detection and applications. Computer and Digital Engineering, 2004; 32(5): 94–97. (in Chinese)
Granitto P M, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 2006; 83(2): 83–90.
Zheng L Z, Huang D C. Outliers detection and semi-supervised clustering algorithm based on shared nearest neighbors. Computer Systems and Applications, 2012; 21(2): 117–121. (in Chinese)
Ho T K. The random subspace method for constructing decision forests. EEE Trans. Pattern Analysis and Machine Intelligence, 1998; 20(8): 832–844.
Genuer R, Poggi J M, Tuleau-Malot C. Variable selection using random forests. Pattern Recognition Letters, 2010; 31(14): 2225–2236.
Shin M, Kang E M, Park S H. Automatically finding good clusters with seed k-means. Genome Informatics Series, 2003; pp.326-327.
Arthur D, Vassilvitskii S. K-means++: The advantages of careful seeding. Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, 2007; pp.1027–1035.
Yong K. Research on feature selection and model optimization of random forest. Harbin: Harbin Institute of Technology, 2008. (in Chinese)
Bahmani B, Moseley B, Vattani A, Kumar R, Vassilvitskii S. Scalable k-means++. Proceedings of the VLDB Endowment, 2012; 5(7): 622–633.
Caruana R, Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. Proceedings of the 23rd International Conference on Machine Learning, ACM, 2006; pp.161–168.
Breiman L. Random forests. Machine Learning, 2001; 45(1): 5–32.
Bootstrap. http://en.wikipedia.org/wiki/Bootstrap_aggregating.
Copyright (c) 2019 International Journal of Agricultural and Biological Engineering
This work is licensed under a Creative Commons Attribution 4.0 International License.