AN AUTOMATED SOFTWARE DEFECT PREDICTION MODEL BASED ON OPEN-SOURCE CODE AND META-LEARNING
Keywords:
Software defect prediction, Meta-learning, Cross-project prediction, Open-source softwareAbstract
Software defect prediction is essential for improving software quality, yet traditional models often struggle with cross-project generalization and require extensive manual tuning. To address these challenges, this paper proposes an automated defect prediction approach based on open-source data and meta-learning. The method constructs project-level meta-features from multiple open-source datasets and builds a meta-dataset by associating these features with the best-performing models. A meta-learner is then trained to automatically recommend suitable prediction models for new projects. Experiments on PROMISE datasets demonstrate that the proposed approach consistently outperforms traditional machine learning and cross-project methods in terms of AUC and F1-score, while significantly enhancing cross-project generalization. Statistical tests confirm the significance of these improvements, and analysis of model selection shows high consistency with optimal choices. Overall, the proposed method effectively reduces manual intervention and provides a robust, automated solution for software defect prediction.References
[1] Malhotra R, Das S. Exploring advanced techniques for software defect prediction: a comprehensive review // 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N): IEEE, 2024: 175-182.
[2] Zhao Y, Damevski K, Chen H. A systematic survey of just-in-time software defect prediction. ACM Computing Surveys, 2023, 55(10): 1-35.
[3] Rajendran D, Singh A A S, Maniar V, et al. Data-Driven Machine Learning-Based Prediction and Performance Analysis of Software Defects for Quality Assurance. Universal Library of Engineering Technology, 2022.
[4] Edison H, Wang X, Conboy K. Comparing methods for large-scale agile software development: A systematic literature review. IEEE Transactions on Software Engineering, 2021, 48(8): 2709-2731.
[5] Saeed M S. Role of feature selection in cross project software defect prediction-a review. International Journal of Computations, Information and Manufacturing (IJCIM), 2023, 3(2): 37-56.
[6] Saeed M S, Saleem M. Cross project software defect prediction using machine learning: a review. International Journal of Computational and Innovative Sciences, 2023, 2(3): 35-52.
[7] Fatma T, Khan N A, SARWAR S. Software defect estimation using data mining techniques: Experimental study of algorithms on “promise” repository // 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom): IEEE, 2023: 1580-1585.
[8] Ren H, Pang B, Bai P, et al. Flood susceptibility assessment with random sampling strategy in ensemble learning (RF and XGBoost). Remote Sensing, 2024, 16(2): 320.
[9] Goyal S. Effective software defect prediction using support vector machines (SVMs). International Journal of System Assurance Engineering and Management, 2022, 13(2): 681-696.