نوع مقاله: مقاله علمی- پژوهشی

نویسندگان

1 شهید چمران اهواز

2 دانشگاه شهید چمران اهواز

چکیده

در این پژوهش یکی از اهداف اصلی شرکت‌های کشت و صنعت نیشکر خوزستان که افزایش میزان عملکرد مزارع نیشکر با استفاده از رهیافت داده‌کاوی می‌باشد، مورد بررسی قرار گرفته است. تصمیم‌گیرندگان در این واحدهای تولیدی کشاورزی با حجم بسیار زیادی از داده‌های جمع‌آوری شده با خصوصیات بسیار متنوع و با روابط پیچیده در بین آن‌ها مواجه هستند که آنالیز و مدیریت آن‌ها به‌وسیله‌ی تجزیه و تحلیل‌های تجربی و آماری، امری دشوار و در بسیاری از حوضه‌ها عملاً ناممکن می‌باشد. داده‌کاوی یک فناوری توانمند در مدیریت و سازماندهی اطلاعات با حجم بالا می‌باشد. در این تحقیق با استفاده از تکنیک‌های داده‌کاوی درخت تصمیم (مدل‌های QUEST و C5.0)، به تخمین عملکرد محصول نیشکر پرداخته شده است. در این راستا مجموعه داده‌های در دسترس همچون داده‌های آبیاری و زهکشی، خاک و گیاه استفاده گردید تا اثر ترکیب‌های متفاوت این عوامل بر عملکرد تولید تعیین گردد. این پژوهش از نوع تحلیلی بوده و پایگاه داده آن شامل رکوردهای 1201 مزرعه می‌باشد. داده‌های مورد نیاز این تحقیق، طی سال‌های زراعی 1393 تا 1396 از کشت و صنعت امیرکبیر به‌دست آمده است. تجزیه و تحلیل به کمک نرم‌افزار IBM modeler 14.2 انجام شده است. نتایج نشان داد، شاخص‌های اجرایی و مدیریتی بر تغییر سطح عملکرد مزارع نیشکر تاثیرگذار می‌باشد. چگونگی تاثیرپذیری سطح عملکرد وابسته به ترکیب‌های خاصی از شاخص‌های اجرایی و مدیریتی می‌باشد که در قالب الگوهای حاصل از مدل‌های درخت تصمیم QUEST و C5.0 استخراج شده است. همچنین واریته محصول در هر دو مدل درخت تصمیم به‌عنوان مهم‌ترین متغیر مستقل در مدل‌سازی ظاهر شده است. بنابراین نتایج به‌دست آمده می‌تواند در برنامه‌ریزی و آماده‌سازی شرایط مطلوب برای رسیدن به اهداف تعیین شده میزان تولید کمک نماید.

کلیدواژه‌ها

عنوان مقاله [English]

Modeling of the Variables that Influence Sugarcane Yield using C5.0 and QUEST Decision Tree Algorithms

نویسندگان [English]

  • H Zakidizaji 1
  • H Bahrami 1
  • N Monjezi 2
  • M. J Shiekhdavoodi 2

1 Shahid Chamran University

2 Shahid Chamran University

چکیده [English]

Introduction
The sugar industry usually gathers huge amounts of information during normal production operations, which is rarely used to study the relative importance of both management and environment on sugarcane yield performance. Yield prediction is a very significant problem of agricultural organizations. Each agronomist wants to know how much yield to expect as soon as possible. The aim of this study was to determine the performance of C5.0 and QUEST algorithms to predict the yield of sugarcane production in Amir-Kabir agro-industry Company of Khuzestan province, Iran. However, the working method described in this paper is applicable to other geographical areas and other kinds of crops.
Materials and Methods
The data for the study were collected from Amir-Kabir agro-industry Company. The data is obtained from 2012 to 2016 years. The study area is located in Khuzestan Province which is a major agricultural region in Iran. The geographical location of the study area is between latitudes 31° 15′ to 31° 40′ north and longitudes 48° 12′ to 48° 30′ east. It covers an area of about 12000 ha. The average elevation of the study area is 8m above sea level. Mean annual rainfall within the study area is 147.1mm, the mean annual temperature is approximately 25°C and the mean soil temperature at 50cm depth is 21.2°C. The used data were obtained from a survey with 15 variables carried out on 1201 sugarcane farms. Variables used in the study of data mining can be divided into two categories: target variable and predictor variables. The variable of yield was used as the target variable (dependent) and other variables as predictor variables (independent). In two models, the input data included crop cultivar, month of harvest, chemical fertilizer (Nitrogen), chemical fertilizer (Phosphate), age (plant or ratoon), times irrigation, ratio of surface spraying, soil texture, soil electrical conductivity (EC), water consumption per hectare, drain, farm management, crop duration, area, and yield-category. The study was included in 1201 farms. The necessary data were collected and pre-processing was performed. We propose to analyze different decision tree methods (C5.0 and QUEST).
Results and Discussion
First, decision tree methods were analyzed for variables. Then, according to C5.0 method (error rate 0.2319 for the training set and 0.3306 for test set) performed slightly better than another method in predicting yield. Crop cultivar is found that an important variable for the yield prediction. 24 rules were found in this study, C4.5 showed a better degree of separation. The measured prediction rate of C5.0 was correct: 76.81% and wrong: 23.19% in the training data, and correct: 66.94% and wrong: 33.06% in the test data. The prediction rate of QUEST was correct: 68.25% and wrong: 31.75% in the training data, and correct: 70.83% and wrong: 29.17% in the test data. Using the training data comparison between the model types showed that the C5.0 model produces a more accurate prediction model and was, therefore, the model to use. Using the testing data in comparison with the model types showed that the QUEST model produced a more accurate prediction model. The results of our assessment showed that C5.0 and QUEST algorithms were capable to produce rules for sugarcane yield. Therefore, our proposed methods as an expert and intelligent system had an impressive impact on sugarcane yield prediction.
Conclusions
In today's conditions, agricultural enterprises are capable of generating and collect large amounts of data. Growth of data size requires an automated method to extract necessary data. By applying data mining technique it is possible to extract useful knowledge and trends. Knowledge gained in this manner may be applied to increase work efficiency and improve decision making quality. Data mining techniques are directed towards finding those schemes of work in data which are valuable and interesting for crop management. In this research, decision tree algorithms (C5.0 and QUEST) were used. This classification algorithm was selected because it has the potential to yield good results in prediction and classification applications. This study was performed to present a model-based data mining to predict sugarcane yield in 2012-2016. The 24 classification rules generated from the C5.0 decision tree algorithm have great practical value in agricultural applications. The results showed the QUEST and C5.0 decision tree algorithms produced the best prediction accuracy. Sensitivity analysis results indicated that crop cultivar was the most important variables. It was observed that efficient technique can be developed and analyzed using the appropriate data, which was collected from Khuzestan province to solve complex agricultural problems using data mining techniques (decision tree). The decision tree has been found useful in classification and prediction modeling due to the fact that it can capability to accurately discover hidden relationships between variables, it is capable of removing insignificant attributes within a dataset.

کلیدواژه‌ها [English]

  • Agriculture
  • Amir-Kabir Agro-Industry
  • Data mining
  • prediction

  1. - Razi Ardakani, H., and A. samimi. 2011.Comparison of Decision Tree in Modeling Choosing a Type of Means of Carriage of Goods. 11th Transportation and Traffic Engineering. 2-3 February, Tehran. (In Farsi)
  2. - Baisen, Z., and R. Tillman. 2007. A decision tree approach to modeling nitrogen fertilizer use efficiency in New Zealand pasture. Plant & Soil 301 (1): 267-278.
  3. - Breman, L., Friedman, J., Olshen, R., and Ch. Stone. 1984. Classification and regression trees. Boca Raton: Chapman & Hall/CRC.
  4. - Choi, J., Jeon, K. H., Won, Y., and J. J. Kim. 2014. Pattern Classification of Foot Diseases using Decision Tree. WSEAS TRANSACTIONS on BIOLOGY and BIOMEDICINE 11: 157-164.
  5. - Ekasingh, B., and K. Ngamsomsuke. 2009. Searching for simplified farmers’ crop choice models for integrated watershed management in Thailand: A data mining approach. Environmental Modelling & Software 24: 1373–1380.
  6. - Geetha, M.C.S. 2015. A Survey on Data Mining Techniques in Agriculture. International Journal of Innovative Research in Computer and Communication Engineering 3(2): 887-892.
  7. - Jeysenthil, K. M. S., Manikandan,T., and V. Murali. 2014. Third Generation Agricultural Support System Development Using Data Mining. International Journal of Innovative Research in Science, Engineering and Technology 3 (3): 9923- 9930.
  8. - Kalpana, R., Shanthi, N., and S. Arumugam. 2014a. Data Mining – An Evolutionary View of Agriculture. International Journal of Application or Innovation in Engineering & Management 3 (3): 102- 105.
  9. - Kalpana, R., Shanthi, N., and S. Arumugam. 2014b. A Survey on Data Mining Techniques in Agriculture. International Journal of Advances in Computer Science and Technology 3(8): 426-431.
  10. - Kotsiantis, S. B. 2007. Supervised machine learning: A review of classification techniques, Informaca (Lj ubl jana) 31(3): 249- 268.
  11. - Monjezi, N., and H. Zakidizaji. 2017. Fuzzy approach to optimize overhaul time of sugarcane harvester using GERT network method. Iranian Journal of Biosystem Engineering 48(1): 83-91. (In Farsi)
  12. - Monjezi, N., Zakidizaji, H., Sheikhdavoodi, M. J., Marzban, A., and M. Shomeili. 2017. Finding and prioritizing of effective parameters on lack of timeliness operations of sugarcane production using Analytical Hierarchy Process (AHP). Journal of Agricultural Machinery 7(2): 514-526. (In Farsi)
  13. - Quinlan, J. 1993. Programs for machine learning. Morgan Kaufmann, San Francisco, CA, pp.
  14. - Rajesh, D. 2011. Application of Spatial Data Mining for Agriculture. International Journal of Computer Applications 15(2): 7-9.
  15. - Ramesh, D., and B.Vishnu Vardhan. 2013. Data Mining Techniques and Applications to Agricultural Yield Data. International Journal of Advanced Research in Computer and Communication Engineering 2(9): 3477-3480.
  16. - Raorane, A. A., and R.V. Kulkarni. 2013. Review- Role of Data Mining in Agriculture. International Journal of Computer Science and Information Technologies 4 (2): 270 – 272.
  17. - Sharma, L., and N. Mehta. 2012. Data Mining Techniques: A Tool for Knowledge Management System in Agriculture. International Journal of Scientific and Technology Research 1(5): 67-73.
  18. - Umesh, D. R., and C. R. Thilak. 2015. Predicting Breast Cancer Survivability Using Naïve Baysien and C5.0 Algorithm. International Journal of Computer Science and Information Technology Research 3 (2): 802-807.
  19. - Yethiraj, N. G. 2012. Applying Data Mining Techniques in the field of agriculture and alliedsciences. International Journal of Business Intelligents 1(2): 72-76.
  20. - Yoneyama, Y., Suzuki, S., Sawa, R., Yoneyama, K., Power, G. G., and T. Araki. 2002. Increased plasma adenosine concentrations and the severity of preeclampsia. Obstet Gynecol 100(6):1266-1270.