Follow us on X
Follow us on Linkedin
Real-world-data enabled assessment
for health regulatory decision-making

New REALM publication

Professor Michel Dumontier and Assistant Professor Chang Sun from Maastricht University co-authored the newest edition to the REALM publications library with the article "Generating unseen diseases patient data using ontology enhanced generative adversarial networks".

You can find the article here Publications | REALM.

In this paper, Chang Sun and Michel Dumontier introduce Onto-CGAN, a novel approach that generates high-quality synthetic patient data for diseases unseen in training data.

Generative Adversarial Networks (GANs) have been employed to generate realistic synthetic health data (e.g., electronic health records), holding promise for fundamental research, AI model development, and enhancing data privacy safeguards. However, the performance of existing GANs (even the newest models) is largely constrained by their reliance on training data, rendering them inadequate for rare or previously unseen diseases.

To tackle this, Michel Dumontier and Chang Sun pioneered a novel approach that combines human-curated ontology knowledge with generative models, facilitating the creation of synthetic health data for diseases absent from the training data (Onto-CGAN). The quality of the generated data is evaluated using variable distributions, correlation coefficients, and machine learning model performance. Onto-CGAN generates unseen diseases with statistical characteristics comparable to the real data and significantly improves the training of machine learning models.

Abstract:
Generating realistic synthetic health data (e.g., electronic health records), holds promise for fundamental research, AI model development, and enhancing data privacy safeguards. Generative Adversarial Networks (GANs) have been employed for this purpose, but their performance is largely constrained by their reliance on training data, rendering them inadequate for rare or previously unseen diseases. This study proposes Onto-CGAN, a novel generative framework that combines knowledge from disease ontologies with GANs to generate unseen diseases that are not present in the training data. The quality of the generated data is evaluated using variable distributions, correlation coefficients, and machine learning model performance. Our findings demonstrate that Onto-CGAN generates unseen diseases with statistical characteristics comparable to the real data, and significantly improves the training of machine learning models. This innovative approach addresses the scarcity of data for rare diseases, offering valuable applications in data augmentation, hypothesis generation, and preclinical validation of clinical models.