You are required to read and agree to the below before accessing a full-text version of an article in the IDE article repository.

The full-text document you are about to access is subject to national and international copyright laws. In most cases (but not necessarily all) the consequence is that personal use is allowed given that the copyright owner is duly acknowledged and respected. All other use (typically) require an explicit permission (often in writing) by the copyright owner.

For the reports in this repository we specifically note that

  • the use of articles under IEEE copyright is governed by the IEEE copyright policy (available at
  • the use of articles under ACM copyright is governed by the ACM copyright policy (available at
  • technical reports and other articles issued by M‰lardalen University is free for personal use. For other use, the explicit consent of the authors is required
  • in other cases, please contact the copyright owner for detailed information

By accepting I agree to acknowledge and respect the rights of the copyright owner of the document I am about to access.

If you are in doubt, feel free to contact

Curating Datasets for Visual Runway Detection



Joakim Lindén , Håkan Forsberg, Josef Haddad , Emil Tagebrand , Erasmus Cedernaes , Emil Gustafsson Ek , Masoud Daneshtalab

Publication Type:

Conference/Workshop Paper


The 40th Digital Avionics Systems Conference




In Machine Learning systems, several factors impact the performance of a trained model. The most important ones include model architecture, the amount of training time, the dataset size and diversity. In the realm of safety-critical machine learning the used datasets need to reflect the environment in which the system is intended to operate, in order to minimize the generalization gap between trained and realworld inputs. Datasets should be thoroughly prepared and requirements on the properties and characteristics of the collected data need to be specified. In our work we present a case study in which generating a synthetic dataset is accomplished based on real-world flight data from the ADS-B system, containing thousands of approaches to several airports to identify real-world statistical distributions of relevant variables to vary within our dataset sampling space. We also investigate what the effects are of training a model on synthetic data to different extents, including training on translated image sets (using domain adaptation). Our results indicate airport location to be the most critical parameter to vary. We also conclude that all experiments did benefit in performance from pre-training on synthetic data rather than using only real data, however this did not hold true in general for domain adaptation-translated images.


author = {Joakim Lind{\'e}n and H{\aa}kan Forsberg and Josef Haddad and Emil Tagebrand and Erasmus Cedernaes and Emil Gustafsson Ek and Masoud Daneshtalab},
title = {Curating Datasets for Visual Runway Detection},
month = {October},
year = {2021},
booktitle = {The 40th Digital Avionics Systems Conference},
publisher = {IEEE},
url = {}