You are required to read and agree to the below before accessing a full-text version of an article in the IDE article repository.

The full-text document you are about to access is subject to national and international copyright laws. In most cases (but not necessarily all) the consequence is that personal use is allowed given that the copyright owner is duly acknowledged and respected. All other use (typically) require an explicit permission (often in writing) by the copyright owner.

For the reports in this repository we specifically note that

  • the use of articles under IEEE copyright is governed by the IEEE copyright policy (available at
  • the use of articles under ACM copyright is governed by the ACM copyright policy (available at
  • technical reports and other articles issued by M‰lardalen University is free for personal use. For other use, the explicit consent of the authors is required
  • in other cases, please contact the copyright owner for detailed information

By accepting I agree to acknowledge and respect the rights of the copyright owner of the document I am about to access.

If you are in doubt, feel free to contact

A Novel Methodology to Classify Test Cases Using Natural Language Processing and Imbalanced Learning



Sahar Tahvili, Leo Hatvani, Enislay Ramentol , Rita Pimentel , Wasif Afzal, Francisco Herrera

Publication Type:

Journal article


Engineering Applications of Artificial Intelligence



Detecting the dependency between integration test cases plays a vital role in the area of software test optimization. Classifying test cases into two main classes - dependent and independent - can be employed for several test optimization purposes such as parallel test execution, test automation, test case selection and prioritization, and test suite reduction. This task can be seen as an imbalanced classification problem due to the test cases' distribution. Often the number of dependent and independent test cases is uneven, which is related to the testing level, testing environment and complexity of the system under test. In this study, we propose a novel methodology that consists of two main steps. Firstly, by using natural language processing we analyze the test cases' specifications and turn them into a numeric vector. Secondly, by using the obtained data vectors, we classify each test case into a dependent or an independent class. We carry out a supervised learning approach using different methods for handling imbalanced datasets. The feasibility and possible generalization of the proposed methodology is evaluated in two industrial projects at Bombardier Transportation, Sweden, which indicates promising results.


author = {Sahar Tahvili and Leo Hatvani and Enislay Ramentol and Rita Pimentel and Wasif Afzal and Francisco Herrera},
title = {A Novel Methodology to Classify Test Cases Using Natural Language Processing and Imbalanced Learning},
volume = {95},
pages = {1--13},
month = {August},
year = {2020},
journal = {Engineering Applications of Artificial Intelligence},
url = {}