You are required to read and agree to the below before accessing a full-text version of an article in the IDE article repository.

The full-text document you are about to access is subject to national and international copyright laws. In most cases (but not necessarily all) the consequence is that personal use is allowed given that the copyright owner is duly acknowledged and respected. All other use (typically) require an explicit permission (often in writing) by the copyright owner.

For the reports in this repository we specifically note that

the use of articles under IEEE copyright is governed by the IEEE copyright policy (available at http://www.ieee.org/web/publications/rights/copyrightpolicy.html)
the use of articles under ACM copyright is governed by the ACM copyright policy (available at http://www.acm.org/pubs/copyright_policy/)
technical reports and other articles issued by M‰lardalen University is free for personal use. For other use, the explicit consent of the authors is required
in other cases, please contact the copyright owner for detailed information

By accepting I agree to acknowledge and respect the rights of the copyright owner of the document I am about to access.

If you are in doubt, feel free to contact webmaster@ide.mdh.se

A drift propensity detection technique to improve the performance for cross-version software defect prediction

Authors:

Md Alamgir Kabir , Jacky Keung , Kwabena Ebo Bennin , Miao Zhang

Publication Type:

Conference/Workshop Paper

Venue:

44th Annual Computers, Software, and Applications Conference

Abstract

In cross-version defect prediction (CVDP), historical data is derived from the prior version of the same project to predict defects of the current version. Recent studies in CVDP focus on subset selection to deal with the changes of the data distributions. No prior study has focused on training data arriving in streaming fashion across the versions where the significant differences between versions make the prediction unreliable. We refer to this situation as Drift Propensity (DP). By identifying DP, necessary steps can be taken (e.g., updating or retraining the model) to improve the prediction performance. In this paper, we investigate the chronological defect datasets and identify DP in the datasets. The no-memory data management technique is employed to manage the data distributions and a DP detection technique is proposed. The idea behind the proposed DP detection technique is to monitor the algorithm's error-rate. The DP detector triggers DP, warning, and control flags to take necessary steps. The proposed technique is significantly superior in identifying the distribution differences (p-value <; 0.05). The DP's identified in the data distributions achieve large effect sizes (Hedges' g ≥ 0.80) during the pair-wise comparisons. We observe that if the error-rate exponentially increases, it causes DP, resulting in prediction performance deterioration. We thus recommend researches and practitioners to address DP in the chronological datasets. Due to its potential effects in the datasets, the prediction models could be enhanced to get the best results in CVDP.

Bibtex

@inproceedings{Kabir 6540, author = {Md Alamgir Kabir and Jacky Keung and Kwabena Ebo Bennin and Miao Zhang}, title = {A drift propensity detection technique to improve the performance for cross-version software defect prediction}, month = {July}, year = {2020}, booktitle = {44th Annual Computers, Software, and Applications Conference}, url = {http://www.es.mdu.se/publications/6540-} }